Code generation#

LLMs are excellent code writers because writing code is a translation task, e.g. from English to Python.

Write Python code for printing out the numbers between 0 and 10.

Certainly! You can print out the numbers between 0 and 10 using a simple for loop in Python. Here’s the code to do that:

for number in range(11):
    print(number)

This code uses the range function to generate numbers from 0 to 10, and then iterates over each number to print it out.

Write Python code for a function that returns "Hello World".

Here is a simple Python function that returns the string “Hello World”:

def get_hello_world():
    return "Hello World"

# Example usage
message = get_hello_world()
print(message)

This code defines a function named get_hello_world that, when called, returns the string “Hello World”. The example usage demonstrates how to call the function and print the result.

Explaining code#

This also works the other way around, e.g. if you ask the LLM what a specific code section does.

I have this Python code, what is it doing?

df['aspect_ratio'] = df['minor_axis_length'] / df['major_axis_length']
filtered_df = df[df['aspect_ratio'] > 2]

This Python code is manipulating a pandas DataFrame named df. Here’s a breakdown of what each line does:

  1. df['aspect_ratio'] = df['minor_axis_length'] / df['major_axis_length']:

    • This line creates a new column in the DataFrame called aspect_ratio.

    • It calculates the aspect ratio by dividing the values in the 'minor_axis_length' column by the corresponding values in the 'major_axis_length' column.

  2. filtered_df = df[df['aspect_ratio'] > 2]:

    • This line filters the DataFrame to create a new DataFrame called filtered_df.

    • It includes only the rows where the aspect_ratio is greater than 2.

    • The resulting filtered_df will therefore contain only those rows from the original DataFrame (df) that meet this condition.

In summary, this code calculates an aspect ratio for each row and then filters the DataFrame to include only those rows where the aspect ratio is greater than 2.

Code documentation#

Assume a data scientist was in a rush and wrote code quickly. Unfortunately, they wrote code without any comments and that’s why it is hard to know what the code does. LLMs can help with adding comments to code.

I have this Python code. Can you add comments to it?

image = imread("data/blobs.tif")
blurred_image = filters.gaussian(image, sigma=1)
threshold = filters.threshold_otsu(blurred_image)
thresholded_image = blurred_image >= threshold
label_image = measure.label(thresholded_image)
imshow(label_image, labels=True)

Certainly! Here is the Python code with comments added to explain each step:

# Read the image from the specified file
image = imread("data/blobs.tif")

# Apply a Gaussian blur to the image with a sigma of 1 to smooth it
blurred_image = filters.gaussian(image, sigma=1)

# Calculate the Otsu's threshold value for the blurred image
threshold = filters.threshold_otsu(blurred_image)

# Create a binary image where pixels with values greater than or equal to the threshold are True
thresholded_image = blurred_image >= threshold

# Label the connected components in the binary image
label_image = measure.label(thresholded_image)

# Display the labeled image with the labels shown
imshow(label_image, labels=True)

This code processes an image by blurring it, applying a threshold to create a binary image, labeling the connected components, and displaying the labeled image.

Fixing errors in code#

You can also ask LLMs to assist in fixing code, e.g. if the code above is executed, there is an error message. If we provide the error message to the LLM, it can assist us in solving the problem.

When I execute the code above in Python, there is an error message. Help me making the code work!

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[1], line 1
----> 1 image = imread("data/blobs.tif")
      2 blurred_image = filters.gaussian(image, sigma=1)
      3 threshold = filters.threshold_otsu(blurred_image)

NameError: name 'imread' is not defined

The error message you’re encountering indicates that the imread function is not recognized. This usually means that the necessary library for image reading hasn’t been imported or is not installed. The function imread is commonly used for reading images and is part of libraries such as OpenCV or skimage.

Here’s how you can modify the code to include the appropriate imports with the skimage library:

from skimage.io import imread, imshow
from skimage import filters, measure

# Read the image file
image = imread("data/blobs.tif")

# Apply Gaussian blur to the image with a given sigma value
blurred_image = filters.gaussian(image, sigma=1)

# Compute the Otsu threshold on the blurred image
threshold = filters.threshold_otsu(blurred_image)

# Create a boolean image where pixels are marked as true if they are above the threshold
thresholded_image = blurred_image >= threshold

# Label connected regions in the thresholded image
label_image = measure.label(thresholded_image)

# Display the labeled image
imshow(label_image, labels=True)

Please ensure you have the scikit-image library installed. You can install it via pip if necessary:

pip install scikit-image

After running the above code with the necessary imports and ensuring all libraries are installed, it should work without the described error.

Exercise#

Run the code above and ask an LLM to assist you in making it work.

Hint: In case of trouble, the code was taken from here.