Code generation#

Code generation makes sense in the context of scientific data analysis especially because code can be executed again and again producing the same results.

As example, we count bright blobs in an image.

import openai
from IPython.display import Markdown

def prompt(message:str, model="gpt-4o-2024-08-06"):
    """A prompt helper function that sends a message to openAI
    and returns only the text response.
    """
    client = openai.OpenAI()
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": message}]
    )
    return response.choices[0].message.content
my_prompt = """
Write Python code that loads blobs.tif, 
counts bright blobs and prints the number. 
Return this code only.
"""

Reviewing generated code#

When generating code, it is recommended to print / visualize before executing it. If you automatically execute code before reviewing it, it may harm your computer.

code = prompt(my_prompt)
Markdown(code)
import cv2
import numpy as np

# Load the image
image = cv2.imread('blobs.tif', cv2.IMREAD_GRAYSCALE)

# Thresholding the image to binary
_, binary_image = cv2.threshold(image, 127, 255, cv2.THRESH_BINARY)

# Find connected components (blobs)
num_labels, labels_im = cv2.connectedComponents(binary_image)

# Print the number of bright blobs, subtracting one for the background label
print(num_labels - 1)

If we are ok with the code, we can execute it.

code = code.replace("```python", "").replace("```", "")
exec(code)
64

A comment on reproducibility#

Depending on which model you use, it may produce the same code again - or not.

code = prompt(my_prompt)
Markdown(code)
import numpy as np
from skimage import io, measure, filters, morphology

# Load the image
image = io.imread('blobs.tif')

# Convert image to grayscale if it is not
if len(image.shape) > 2:
    image = rgb2gray(image)

# Apply a threshold to convert the image to binary
binary_image = image > filters.threshold_otsu(image)

# Remove small objects to isolate blobs better
cleaned_image = morphology.remove_small_objects(binary_image, min_size=20)

# Label connected components
labeled_image, num_blobs = measure.label(cleaned_image, return_num=True)

# Print the number of blobs found
print("Number of bright blobs:", num_blobs)
code = code.replace("```python", "").replace("```", "")
exec(code)
Number of bright blobs: 61

Exercise#

Rerun the code, wait a minute between the two prompt() calls above and see if the code is identical to what is saved now. Also check if the number of cells is the same.

Advanced, optional exercise: Modify the prompt function to use Anthropic’s Claude.