Code generation using AI#

When writing Python code, it often comes to the situation that we write similar code again and again. Luckily, the invention of large language models, such as ChatGPT, allow us to automate this task. This notebook was almost entirely written by ChatGPT through an open-source library called bia-bob.

import os
os.environ['OPENAI_API_KEY'] = 'sk-...'
import bia_bob
bia_bob.__version__
'0.15.0'

When we provide instructions that are detailed enough, bob can generated code for solving specific tasks.

In the next cell, starting with %%bob we see a prompt asking ChatGPT under the hood to write code for teh creating of near-realistic weather data. The corresponding code is given in the next cell.

%%bob Write python code that generates a pandas Dataframe with three columns: 
* date with datetype values ranging from January 1st 2023 to December 31st 2023, 
* temperature in degrees celsius, and 
* rain in mm.
Ignore that 2023 was a leap year and make sure that the generated values cover the 365 days.
The values both for temperature and rain shall have a distribution typical for the four seasons in Germany.
For rain, also create some outliers reflecting heavy rain events typical for Germany.
Save this dataframe to a csv file called 'german_weather_2023.csv'.

We will generate a pandas DataFrame with columns for the day (covering the entire year 2023), temperature (in degrees Celsius), and rain (in mm). The temperature and rain values will follow typical seasonal variations for Germany, with some outliers for heavy rain events. Finally, we will save the DataFrame to a CSV file named ‘german_weather_2023.csv’.

import pandas as pd
import numpy as np

# Step 1: Create date range
date_range = pd.date_range(start='2023-01-01', end='2023-12-31')

# Step 2-3: Generate seasonally typical temperature and rain data for Germany
np.random.seed(42)  # For reproducibility
temperatures = []
rain = []
for date in date_range:
    if date.month in [12, 1, 2]:  # Winter
        temperatures.append(np.random.uniform(-5, 5))
        rain_value = np.random.uniform(5, 15)
    elif date.month in [3, 4, 5]:  # Spring
        temperatures.append(np.random.uniform(5, 15))
        rain_value = np.random.uniform(5, 20)
    elif date.month in [6, 7, 8]:  # Summer
        temperatures.append(np.random.uniform(15, 30))
        rain_value = np.random.uniform(0, 10)
    elif date.month in [9, 10, 11]:  # Autumn
        temperatures.append(np.random.uniform(5, 15))
        rain_value = np.random.uniform(5, 20)
    
    # Introduce some outliers for heavy rain events
    if np.random.rand() < 0.05:  # 5% probability
        rain_value *= np.random.uniform(2, 5)
    
    rain.append(rain_value)
    
# Step 4: Combine into DataFrame
data = {
    'date': date_range,
    'temperature': temperatures,
    'rain': rain
}
df = pd.DataFrame(data)

# Step 5: Save to CSV
df.to_csv('german_weather_2023.csv', index=False)

# Display the first few rows of DataFrame for verification
display(df.head())
date temperature rain
0 2023-01-01 -1.254599 14.507143
1 2023-01-02 0.986585 6.560186
2 2023-01-03 -4.419164 13.661761
3 2023-01-04 2.080726 5.205845
4 2023-01-05 3.324426 7.123391

Generating Notebooks#

We can also use the same mechanism to generate an entire Jupyter notebook for the analysis task.

If we specify that we are Python beginners, the resulting code and explanations will be more detailed, tailored to our needs.

%%bob assume you communicate with a Python beginner. Generate a Jupyter notebook named `python_weather_analysis` that covers the following:
* Load german_weather_2023.csv using pandas, 
* visualize the head of the table, 
* summarize the table and show the infos for the dataframe
* compute the mean and maximum temperature (a column in the table)
* make use of pandas internal plotting methods to plot the rain over the days (scatter plot), omit the x-axis labels.
* group the data to the four seasons by associating the months of a year. 
* plot a boxplot of rain in the four seasons using seaborn.'

A notebook has been saved as python_weather_analysis.ipynb.

Exercise#

Ask ´bob´ to generate a dataset of students and exam grades. Ask it to write another notebook for plotting the grades between 1-5 in a box-plot. What was the average grade of the made-up students?