{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "1413fb18-3c0e-4cb7-ba25-42d3eeae09c8",
   "metadata": {},
   "source": [
    "# Summarizing generated code failure reasons\n",
    "This notebook demonstrates how one can dive into summarizing error messages and failure reasons from HumanEval-like benchmarks. The `_result.jsonl` files contain a column `result`, which contains a string, that in case of test failure is \"failed: \" and in other cases contains additionally the error message that was observed. These failures and errors can be summarized for each model as shown here.\n",
    "\n",
    "The data used in this notebook originates from the [human-eval-bia](https://github.com/haesleinhuepf/human-eval-bia) project and is licensed BSD-3."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "da4ac394-a726-42c6-be64-be200c63bd13",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import os"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "45984da5-ba46-4c0c-b7d1-e842bd3e9725",
   "metadata": {},
   "outputs": [],
   "source": [
    "directory = \"data/\"\n",
    "# if you want to investigate a single model only, add its name here:\n",
    "search_term = \"\"\n",
    "\n",
    "# Enter the terms to search for here\n",
    "common_errors = ['has no attribute', 'invalid syntax', \"Can't convert object\", 'cannot import', 'out of range', 'unexpected keyword argument']"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d4166e77-38fa-4954-a9d7-0015efd47f8f",
   "metadata": {},
   "source": [
    "First we collect all results and the corresponding models from the jsonl files."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "c2abe33e-1d70-4ce4-b003-3923c4ff1fb8",
   "metadata": {},
   "outputs": [],
   "source": [
    "collection = []\n",
    "for filename in os.listdir(directory):\n",
    "    if search_term in filename and filename.endswith(\"_results.jsonl\"):\n",
    "        df = pd.read_json(directory + filename, lines=True)\n",
    "        df['model'] = filename.replace(\"samples_\",\"\").replace(\"_results\",\"\").replace(\".jsonl\",\"\")\n",
    "        collection.append(df)\n",
    "\n",
    "if len(collection) == 1:\n",
    "    df = collection[0]\n",
    "else:\n",
    "    df = pd.concat(collection)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "77f92b8d-5eb2-4d64-984d-bf703e78fed7",
   "metadata": {},
   "source": [
    "We then focus on the tests which failed."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "c688498a-6c20-454f-8c38-3168fe25bae5",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>task_id</th>\n",
       "      <th>completion</th>\n",
       "      <th>full_response</th>\n",
       "      <th>result</th>\n",
       "      <th>passed</th>\n",
       "      <th>model</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>../test_cases/apply_otsu_threshold_and_count_p...</td>\n",
       "      <td>\\n# Plan:\\n# 1. Import required libraries (cv2...</td>\n",
       "      <td>Here's the completed code with a step-by-step ...</td>\n",
       "      <td>failed: OpenCV(4.9.0) D:\\a\\opencv-python\\openc...</td>\n",
       "      <td>False</td>\n",
       "      <td>claude-3-5-sonnet-20240620</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>../test_cases/binary_skeleton.ipynb</td>\n",
       "      <td>\\n# Plan:\\n# 1. Import necessary libraries (sk...</td>\n",
       "      <td>Here's the completed code with a step-by-step ...</td>\n",
       "      <td>failed: Input must be a 2D numpy array of bool...</td>\n",
       "      <td>False</td>\n",
       "      <td>claude-3-5-sonnet-20240620</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>../test_cases/convolve_images.ipynb</td>\n",
       "      <td>\\n# Plan:\\n# 1. Check if the input images are ...</td>\n",
       "      <td>Here's the complete code with a plan, necessar...</td>\n",
       "      <td>failed: Input image and kernel image must have...</td>\n",
       "      <td>False</td>\n",
       "      <td>claude-3-5-sonnet-20240620</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>../test_cases/deconvolve_image.ipynb</td>\n",
       "      <td>\\n# Plan:\\n# 1. Import required libraries (num...</td>\n",
       "      <td>Here's the completed code with a plan, necessa...</td>\n",
       "      <td>failed: fft2() got an unexpected keyword argum...</td>\n",
       "      <td>False</td>\n",
       "      <td>claude-3-5-sonnet-20240620</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>../test_cases/detect_edges.ipynb</td>\n",
       "      <td>\\n# Plan:\\n# 1. Convert the image to grayscale...</td>\n",
       "      <td>Here's the completed code with a plan, necessa...</td>\n",
       "      <td>failed: OpenCV(4.9.0) d:\\a\\opencv-python\\openc...</td>\n",
       "      <td>False</td>\n",
       "      <td>claude-3-5-sonnet-20240620</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>560</th>\n",
       "      <td>../test_cases/sum_intensity_projection.ipynb</td>\n",
       "      <td>\\ndef sum_intensity_projection(image):\\n    \"\"...</td>\n",
       "      <td>```python\\ndef sum_intensity_projection(image)...</td>\n",
       "      <td>failed:</td>\n",
       "      <td>False</td>\n",
       "      <td>gpt-4o-2024-05-13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>561</th>\n",
       "      <td>../test_cases/tiled_image_processing.ipynb</td>\n",
       "      <td>\\ndef tiled_image_processing(image, radius, ti...</td>\n",
       "      <td>```python\\ndef tiled_image_processing(image, r...</td>\n",
       "      <td>failed:</td>\n",
       "      <td>False</td>\n",
       "      <td>gpt-4o-2024-05-13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>562</th>\n",
       "      <td>../test_cases/transpose_image_axes.ipynb</td>\n",
       "      <td>\\ndef transpose_image_axes(image):\\n    \"\"\"\\n ...</td>\n",
       "      <td>```python\\ndef transpose_image_axes(image):\\n ...</td>\n",
       "      <td>failed: axes don't match array</td>\n",
       "      <td>False</td>\n",
       "      <td>gpt-4o-2024-05-13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>564</th>\n",
       "      <td>../test_cases/workflow_batch_process_folder_co...</td>\n",
       "      <td>\\ndef workflow_batch_process_folder_count_labe...</td>\n",
       "      <td>```python\\ndef workflow_batch_process_folder_c...</td>\n",
       "      <td>failed:</td>\n",
       "      <td>False</td>\n",
       "      <td>gpt-4o-2024-05-13</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>569</th>\n",
       "      <td>../test_cases/workflow_watershed_segmentation_...</td>\n",
       "      <td>\\ndef workflow_watershed_segmentation_correcti...</td>\n",
       "      <td>```python\\ndef workflow_watershed_segmentation...</td>\n",
       "      <td>failed: OpenCV(4.9.0) D:/a/opencv-python/openc...</td>\n",
       "      <td>False</td>\n",
       "      <td>gpt-4o-2024-05-13</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>921 rows × 6 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "                                               task_id  \\\n",
       "0    ../test_cases/apply_otsu_threshold_and_count_p...   \n",
       "2                  ../test_cases/binary_skeleton.ipynb   \n",
       "6                  ../test_cases/convolve_images.ipynb   \n",
       "12                ../test_cases/deconvolve_image.ipynb   \n",
       "13                    ../test_cases/detect_edges.ipynb   \n",
       "..                                                 ...   \n",
       "560       ../test_cases/sum_intensity_projection.ipynb   \n",
       "561         ../test_cases/tiled_image_processing.ipynb   \n",
       "562           ../test_cases/transpose_image_axes.ipynb   \n",
       "564  ../test_cases/workflow_batch_process_folder_co...   \n",
       "569  ../test_cases/workflow_watershed_segmentation_...   \n",
       "\n",
       "                                            completion  \\\n",
       "0    \\n# Plan:\\n# 1. Import required libraries (cv2...   \n",
       "2    \\n# Plan:\\n# 1. Import necessary libraries (sk...   \n",
       "6    \\n# Plan:\\n# 1. Check if the input images are ...   \n",
       "12   \\n# Plan:\\n# 1. Import required libraries (num...   \n",
       "13   \\n# Plan:\\n# 1. Convert the image to grayscale...   \n",
       "..                                                 ...   \n",
       "560  \\ndef sum_intensity_projection(image):\\n    \"\"...   \n",
       "561  \\ndef tiled_image_processing(image, radius, ti...   \n",
       "562  \\ndef transpose_image_axes(image):\\n    \"\"\"\\n ...   \n",
       "564  \\ndef workflow_batch_process_folder_count_labe...   \n",
       "569  \\ndef workflow_watershed_segmentation_correcti...   \n",
       "\n",
       "                                         full_response  \\\n",
       "0    Here's the completed code with a step-by-step ...   \n",
       "2    Here's the completed code with a step-by-step ...   \n",
       "6    Here's the complete code with a plan, necessar...   \n",
       "12   Here's the completed code with a plan, necessa...   \n",
       "13   Here's the completed code with a plan, necessa...   \n",
       "..                                                 ...   \n",
       "560  ```python\\ndef sum_intensity_projection(image)...   \n",
       "561  ```python\\ndef tiled_image_processing(image, r...   \n",
       "562  ```python\\ndef transpose_image_axes(image):\\n ...   \n",
       "564  ```python\\ndef workflow_batch_process_folder_c...   \n",
       "569  ```python\\ndef workflow_watershed_segmentation...   \n",
       "\n",
       "                                                result  passed  \\\n",
       "0    failed: OpenCV(4.9.0) D:\\a\\opencv-python\\openc...   False   \n",
       "2    failed: Input must be a 2D numpy array of bool...   False   \n",
       "6    failed: Input image and kernel image must have...   False   \n",
       "12   failed: fft2() got an unexpected keyword argum...   False   \n",
       "13   failed: OpenCV(4.9.0) d:\\a\\opencv-python\\openc...   False   \n",
       "..                                                 ...     ...   \n",
       "560                                           failed:    False   \n",
       "561                                           failed:    False   \n",
       "562                     failed: axes don't match array   False   \n",
       "564                                           failed:    False   \n",
       "569  failed: OpenCV(4.9.0) D:/a/opencv-python/openc...   False   \n",
       "\n",
       "                          model  \n",
       "0    claude-3-5-sonnet-20240620  \n",
       "2    claude-3-5-sonnet-20240620  \n",
       "6    claude-3-5-sonnet-20240620  \n",
       "12   claude-3-5-sonnet-20240620  \n",
       "13   claude-3-5-sonnet-20240620  \n",
       "..                          ...  \n",
       "560           gpt-4o-2024-05-13  \n",
       "561           gpt-4o-2024-05-13  \n",
       "562           gpt-4o-2024-05-13  \n",
       "564           gpt-4o-2024-05-13  \n",
       "569           gpt-4o-2024-05-13  \n",
       "\n",
       "[921 rows x 6 columns]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = df[df['passed'] == False]\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2e8e4e9b-2c69-44cb-aabc-6eacde66c9eb",
   "metadata": {},
   "source": [
    "# Example errors\n",
    "We just print out some example error messages:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "20c4c64f-6612-425c-86af-e2a52f109ff9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[\"failed: OpenCV(4.9.0) D:\\\\a\\\\opencv-python\\\\opencv-python\\\\opencv\\\\modules\\\\imgproc\\\\src\\\\thresh.cpp:1555: error: (-2:Unspecified error) in function 'double __cdecl cv::threshold(const class cv::_InputArray &,const class cv::_OutputArray &,double,double,int)'\\n> THRESH_OTSU mode:\\n>     'src_type == CV_8UC1 || src_type == CV_16UC1'\\n> where\\n>     'src_type' is 4 (CV_32SC1)\\n\",\n",
       " 'failed: Input must be a 2D numpy array of boolean type',\n",
       " 'failed: Input image and kernel image must have the same dimensions',\n",
       " \"failed: fft2() got an unexpected keyword argument 's'\",\n",
       " \"failed: OpenCV(4.9.0) d:\\\\a\\\\opencv-python\\\\opencv-python\\\\opencv\\\\modules\\\\imgproc\\\\src\\\\color.simd_helpers.hpp:92: error: (-2:Unspecified error) in function '__cdecl cv::impl::`anonymous-namespace'::CvtHelper<struct cv::impl::`anonymous namespace'::Set<3,4,-1>,struct cv::impl::A0x59191d0d::Set<1,-1,-1>,struct cv::impl::A0x59191d0d::Set<0,2,5>,4>::CvtHelper(const class cv::_InputArray &,const class cv::_OutputArray &,int)'\\n> Invalid number of channels in input image:\\n>     'VScn::contains(scn)'\\n> where\\n>     'scn' is 1\\n\",\n",
       " 'failed: ',\n",
       " 'failed: ',\n",
       " 'failed: Input must be a numpy array of boolean type',\n",
       " 'failed: ',\n",
       " 'failed: ']"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.head(10)['result'].tolist()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d134c4f6-0277-426a-ab80-81362aa08c5c",
   "metadata": {},
   "source": [
    "## Searching for common terms\n",
    "First, we search the error messages for common errors as specified above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "a840bf73-6480-4b95-9ace-4097361fe0d2",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "C:\\Users\\haase\\AppData\\Local\\Temp\\ipykernel_10772\\3576577103.py:7: DeprecationWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning.\n",
      "  error_counts = df.groupby('model').apply(count_errors, error_list=common_errors)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>model</th>\n",
       "      <th>claude-3-5-sonnet-20240620</th>\n",
       "      <th>gemini-1.5-flash-001</th>\n",
       "      <th>gpt-4o-2024-05-13</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>has no attribute</th>\n",
       "      <td>28</td>\n",
       "      <td>28</td>\n",
       "      <td>33</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>invalid syntax</th>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Can't convert object</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>cannot import</th>\n",
       "      <td>0</td>\n",
       "      <td>13</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>out of range</th>\n",
       "      <td>0</td>\n",
       "      <td>20</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>unexpected keyword argument</th>\n",
       "      <td>8</td>\n",
       "      <td>2</td>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "model                        claude-3-5-sonnet-20240620  gemini-1.5-flash-001  \\\n",
       "has no attribute                                     28                    28   \n",
       "invalid syntax                                        0                     1   \n",
       "Can't convert object                                  0                     0   \n",
       "cannot import                                         0                    13   \n",
       "out of range                                          0                    20   \n",
       "unexpected keyword argument                           8                     2   \n",
       "\n",
       "model                        gpt-4o-2024-05-13  \n",
       "has no attribute                            33  \n",
       "invalid syntax                               0  \n",
       "Can't convert object                         0  \n",
       "cannot import                                2  \n",
       "out of range                                 1  \n",
       "unexpected keyword argument                  5  "
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Define the function to count errors\n",
    "def count_errors(group, error_list):\n",
    "    counts = {error: group['result'].str.contains(error, regex=False).sum() for error in error_list}\n",
    "    return pd.Series(counts)\n",
    "\n",
    "# Apply the function to each model group\n",
    "error_counts = df.groupby('model').apply(count_errors, error_list=common_errors)\n",
    "\n",
    "# Transpose the result for the desired format: models as columns, errors as rows\n",
    "error_counts = error_counts.T\n",
    "error_counts"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "766f05eb-711b-442f-b571-0b59146f781f",
   "metadata": {},
   "source": [
    "## Most popular failure reasons\n",
    "Furthermore, we search for the three most observed reasons for failure. These might be either error messages, or in case the result is only `failed: ` this indicated that the tests were not passed, presumably because the tested function did not return the right result."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "27b3e76c-1c26-45c9-9285-409e885b643c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Model</th>\n",
       "      <th>Top1 Result</th>\n",
       "      <th>Top1 Count</th>\n",
       "      <th>Top2 Result</th>\n",
       "      <th>Top2 Count</th>\n",
       "      <th>Top3 Result</th>\n",
       "      <th>Top3 Count</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>claude-3-5-sonnet-20240620</td>\n",
       "      <td>failed:</td>\n",
       "      <td>149</td>\n",
       "      <td>failed: 'list' object has no attribute 'shape'</td>\n",
       "      <td>20</td>\n",
       "      <td>failed: OpenCV(4.9.0) D:\\a\\opencv-python\\openc...</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>gemini-1.5-flash-001</td>\n",
       "      <td>failed:</td>\n",
       "      <td>166</td>\n",
       "      <td>failed: OpenCV(4.9.0) d:\\a\\opencv-python\\openc...</td>\n",
       "      <td>37</td>\n",
       "      <td>failed: name 'np' is not defined</td>\n",
       "      <td>29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>gpt-4o-2024-05-13</td>\n",
       "      <td>failed:</td>\n",
       "      <td>146</td>\n",
       "      <td>failed: 'list' object has no attribute 'shape'</td>\n",
       "      <td>21</td>\n",
       "      <td>failed: OpenCV(4.9.0) d:\\a\\opencv-python\\openc...</td>\n",
       "      <td>12</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                        Model Top1 Result  Top1 Count  \\\n",
       "0  claude-3-5-sonnet-20240620    failed:          149   \n",
       "1        gemini-1.5-flash-001    failed:          166   \n",
       "2           gpt-4o-2024-05-13    failed:          146   \n",
       "\n",
       "                                         Top2 Result  Top2 Count  \\\n",
       "0     failed: 'list' object has no attribute 'shape'          20   \n",
       "1  failed: OpenCV(4.9.0) d:\\a\\opencv-python\\openc...          37   \n",
       "2     failed: 'list' object has no attribute 'shape'          21   \n",
       "\n",
       "                                         Top3 Result  Top3 Count  \n",
       "0  failed: OpenCV(4.9.0) D:\\a\\opencv-python\\openc...          10  \n",
       "1                   failed: name 'np' is not defined          29  \n",
       "2  failed: OpenCV(4.9.0) d:\\a\\opencv-python\\openc...          12  "
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Step 1: Group the DataFrame by 'model' and get the value counts of 'result'\n",
    "model_result_count = df.groupby('model')['result'].value_counts()\n",
    "\n",
    "# Step 2: Create an empty DataFrame to store the results\n",
    "model_top_results = []\n",
    "\n",
    "# Step 3: Loop through each group to get the three most common results per model\n",
    "for model, counts in model_result_count.groupby(level=0):\n",
    "    # Get the top three results (note: nlargest returns the results)\n",
    "    top_three = counts.nlargest(3)\n",
    "    # Prepare data to append to the DataFrame\n",
    "    data = {\n",
    "        'Model': model,\n",
    "        'Top1 Result': top_three.index.get_level_values(1)[0],\n",
    "        'Top1 Count': top_three.iloc[0],\n",
    "        'Top2 Result': top_three.index.get_level_values(1)[1] if len(top_three) > 1 else None,\n",
    "        'Top2 Count': top_three.iloc[1] if len(top_three) > 1 else None,\n",
    "        'Top3 Result': top_three.index.get_level_values(1)[2] if len(top_three) > 2 else None,\n",
    "        'Top3 Count': top_three.iloc[2] if len(top_three) > 2 else None\n",
    "    }\n",
    "    # Append data\n",
    "    model_top_results.append(data)\n",
    "\n",
    "# Display the resulting DataFrame\n",
    "most_common_errors = pd.DataFrame(model_top_results)\n",
    "most_common_errors"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "64d6faa6-f95f-42c5-af4c-06b1f99c208a",
   "metadata": {},
   "source": [
    "## Exercise\n",
    "Determine which LLM had the most tests passing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f06f3b7a-9c3b-4f72-a153-3563a6082da8",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "markdown",
   "id": "dbe63443-3c5d-4894-bee2-54560af022f3",
   "metadata": {},
   "source": [
    "Determine how often the LLMs produce code with missing import statements."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d6c372f3-c3d7-4bfb-abf8-afaf6efe14aa",
   "metadata": {},
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c37ae901-45bf-4ef7-92e3-336a7084fa68",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.19"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}