Generating Jupyter books#

In this notebook we will generate a Jupyter book using a large language model. The book is available online and was created using Claude 3.5 Sonnet.

import anthropic
import openai
import datetime
import os
from pathlib import Path
from functools import partial
from IPython.display import Markdown, display
openai.__version__, anthropic.__version__
('1.30.1', '0.29.0')

Defining the content of the book#

The topic of the book will be specified and also the table of contents and some extra hints:

topic = "Python basics"
# The table of contents must be a markdown list with * at the beginning of every line.
toc = """
* Introduction to Jupyter notebooks
* Mathematical operations
* Data Types: Lists, Tuples, Dictionaries
* For-loops
* Conditional statements
* Custom functions
* Image Processing with sckit-image
* Tabular data wrangling with pandas
* Plotting with seaborn
* Random forest classifiers in scikit-learn
"""
extra_hints = """
If you need an example image for image processing, use skimage.data.cells3d. In case you use it, add `pooch` to the list requirements.
"""

We will also specify the location where to store the book:

base_dir = ""
repository_url = "https://github.com/generated-books/python-basics"

We will use this language model to generate the book:

model = "claude-3-5-sonnet-20240620"

Helper functions#

Here we create some helper functions for prompting and for file format handling.

def prompt_chatGPT(message:str, model="gpt-4o-2024-05-13"):
    """
    A prompt helper function that sends a message to openAI
    and returns only the text response.
    """
    import os
    import openai
    
    # convert message in the right format if necessary
    if isinstance(message, str):
        message = [{"role": "user", "content": message}]
        
    # setup connection to the LLM
    client = openai.OpenAI()
    
    # submit prompt
    response = client.chat.completions.create(
        model=model,
        messages=message
    )
    
    # extract answer
    return response.choices[0].message.content
def prompt_claude(message:str, model="claude-3-5-sonnet-20240620"):
    """
    A prompt helper function that sends a message to anthropic
    and returns only the text response.

    Example models: claude-3-5-sonnet-20240620 or claude-3-opus-20240229
    """
    import os
    from anthropic import Anthropic
    
    # convert message in the right format if necessary
    if isinstance(message, str):
        message = [{"role": "user", "content": message}]
        
    # setup connection to the LLM
    client = Anthropic()
    
    message = client.messages.create(
        max_tokens=4096,
        messages=message,
        model=model,
    )

    # extract answer
    return message.content[0].text
if "gpt" in model:
    prompt = partial(prompt_gpt, model=model)
else:
    prompt = partial(prompt_claude, model=model)    
def prompt_with_memory(message:str):
    """
    This function allows to use an LLMs in a chat-mode. 
    The LLM is equipped with some memory, 
    so that we can refer back for former conversation steps.
    """
    
    # convert message in the right format and store it in memory
    question = {"role": "user", "content": message}
    chat_history.append(question)
    
    # receive answer
    response = prompt(chat_history)
    
    # convert answer in the right format and store it in memory
    answer = {"role": "assistant", "content": response}
    chat_history.append(answer)
    
    return response
def is_valid_json(test_string):
    """This function returns if a string is formatted json."""
    import json
    try:
        json.loads(test_string)
        return True
    except:
        return False

def ensure_json(notebook):
    """This function makes sure that the passed notebook is indeed a json-formatted ipynb file."""
    if is_valid_json(notebook):
        return notebook
        
    return prompt(f"""
Take the following text and extract the Jupyter 
notebook ipynb/json from it:

{notebook}

Make sure the output is in ipynb/json format. 
Respond only the JSON content.
""").strip("```json").strip("```python").strip("```")

Context#

Here we provide some context to the language model. As gpt4 and claude have different APIs for providing system messages, we instead use this message to start the conversation.

system_message = f"""
You are data scientist and statistician. 
You have didactic skills and you can explain data analysis very well.
You are about to write a Jupyter book consisting of multiple Jupyter notebooks about a given topic.

In front of every code-cell, add a markdown cell with an explanation of the next code cell. 
Write 1-3 sentences in these markdown cells.
When writing a notebook, always keep the code in the code cells concise. 
Do only one thing and let the user see the intermediate result.
Then, continue with the next thing in a new code cell.

{extra_hints}

Confirm this with "ok".
"""

chat_history = [{"role": "user", "content": system_message}, {"role": "assistant", "content": "ok"}]

We just test if the chat mode works:

prompt_with_memory("Hi, my name is Robert Haase.")
"Hello Robert Haase! It's nice to meet you. I'm ready to assist you with creating a Jupyter book consisting of multiple Jupyter notebooks. As requested, I'll make sure to add explanatory markdown cells before each code cell, keep the code concise, and break down operations into separate cells for clarity. I'll also use the skimage.data.cells3d example for image processing tasks if needed. How can I help you get started with your Jupyter book project?"
prompt_with_memory("What is my name?")
'Your name is Robert Haase.'

Chatting about book content#

We start chatting with the LLM about the book’s content. It is key that the LLM knows about all the content of the book before it starts generating the first notebook.

Markdown(prompt_with_memory(f"""
I would like to teach others in {topic} and cover these aspects:
{toc}

Therefore, it would be great to have training material in the form of a Jupyter book.

Which Python libraries are relevant in this context? Do not write any Python code yet.
"""))

For the topics you’ve mentioned, the following Python libraries are relevant:

  1. Jupyter: For creating and running interactive notebooks.

  2. NumPy: For mathematical operations and working with arrays.

  3. scikit-image: For image processing tasks.

  4. pandas: For tabular data manipulation and analysis.

  5. seaborn: For statistical data visualization.

  6. matplotlib: As the underlying library for seaborn and general plotting.

  7. scikit-learn: For machine learning tasks, including random forest classifiers.

Additionally, these standard Python libraries will be useful:

  1. random: For generating random numbers (useful in various examples).

  2. math: For additional mathematical functions.

  3. itertools: For advanced iteration tools.

While not libraries, it’s also worth mentioning that we’ll be using Python’s built-in data types (lists, tuples, dictionaries) and control structures (for-loops, conditional statements) extensively.

For the image processing section using scikit-image, we’ll also need:

  1. pooch: For downloading and caching data files, as it’s required for the skimage.data.cells3d example.

These libraries will cover all the aspects you want to teach in your Python basics course.

Generating the book#

Here we start generating the notebooks for the content listed in the table of contents.

contents = toc.strip("\n").strip("* ").split("\n* ")

for i, subtopic in enumerate(contents):
    notebook = ensure_json(prompt(
        [{"role": "user", "content": system_message},
         {"role": "assistant", "content": "ok"},
         {"role": "user", "content": f"""
    Please write a Jupyter notebook in json format about "{subtopic}" as part of a course about {topic}.
    Respond only the JSON content.
    """}])).strip("```json").strip("```python").strip("```")

    # f"{i:02}_" + 
    filename = Path(base_dir) / "docs" / prompt_with_memory(f"What would be a good filename for the '{subtopic}' notebook? Make sure it contains no spaces and ends with .ipynb . Respond with the filename only.")

    directory = directory = Path(filename).parent
    os.makedirs(directory, exist_ok=True)
    with open(filename, 'w', encoding='utf-8') as file:
        file.write(notebook)

    print(subtopic, ":", filename)
Introduction to Jupyter notebooks : docs\01_introduction_to_jupyter_notebooks.ipynb
Mathematical operations : docs\02_mathematical_operations.ipynb
Data Types: Lists, Tuples, Dictionaries : docs\03_data_types_lists_tuples_dictionaries.ipynb
For-loops : docs\04_for_loops.ipynb
Conditional statements : docs\05_conditional_statements.ipynb
Custom functions : docs\06_custom_functions.ipynb
Image Processing with sckit-image : docs\07_image_processing_with_scikit_image.ipynb
Tabular data wrangling with pandas : docs\08_tabular_data_wrangling_with_pandas.ipynb
Plotting with seaborn : docs\09_plotting_with_seaborn.ipynb
Random forest classifiers in scikit-learn : docs\10_random_forest_classifiers_scikit_learn.ipynb

Generating additional text and config files#

We would like to build the book automatically, and we also need some introduction texts and documentation. Now that the individual notebooks have been built, we can generate those additional files as well.

docs_folder = Path(base_dir) / "docs"
today = datetime.date.today().strftime("%B %d, %Y")

more_files = {
    Path(base_dir) / "docs" / "intro.md": 
f"""
Create a intro.md file for a jupyter book that contains all Jupyter notebooks we just created. 
The introduction should give an overview in text form and with bullet points linking to the notebooks.
Mention that the entire book is AI-generated.
The repository url of the book is `{repository_url}`.
Mention that the `generator.ipynb` file in the github repository contains all the code used for generating the book. Add a link to this file.
Respond the content of this file only.
""",
    
    Path(base_dir) / "docs" / "_toc.yml": 
"""
Build a table of contents in Jupytyer book yml format.
First, mention the intro.md file.
Please give me the list of all notebook filenames we just created. 
Put them in a _yml file for a Jupyter book.
Respond the content of this file only.
""",

    Path(base_dir) / "docs" / "requirements.txt":
f"""
A requirements.txt file in the `docs` folder containing all python libraries used in this Jupyter book.
Respond the content of this file only.
""",
    
    Path(base_dir) / "docs" / "_config.yml": 
f"""
Create a minimal config.yml file for the jupyter book.
The book will be uploaded to this github repository: {repository_url}
Make sure the notebooks will be executed when the book is built.
The icon for the book is saved in ../icon.png
Note that today is {today}.
Respond the content of this file only.
""",
    
    Path(base_dir) / ".github" / "workflows" / "book.yml": 
f"""
Write a Github workflow file that builds the book and uploads the content to the gh_pages branch.
The book is stored in the `{docs_folder}` folder of the respository.
Respond the content of this file only.
""",

    Path(base_dir) / "readme.md": 
f"""
Create a readme.md file for the jupyter book. 
Give instructions how to build the book.
Mention that the entire book is AI-generated. 
Mention that the `generator.ipynb` file in the github repository contains all the code used for generating the book.
Respond the content of this file only.
""",

}

for filename, task in more_files.items():
    file_content = prompt_with_memory(task)

    directory = Path(filename).parent
    os.makedirs(directory, exist_ok=True)
    
    with open(filename, 'w', encoding='utf-8') as file:
        file.write(file_content)

    print(filename)
docs\intro.md
docs\_toc.yml
docs\requirements.txt
docs\_config.yml
.github\workflows\book.yml
readme.md

Chat history#

For documentation purposes, we output the entire chat with the LLM. Note: The notebooks were generated without storing the notebooks in the chat-history because that would make the history too quickly too long.

chat_history
[{'role': 'user',
  'content': '\nYou are data scientist and statistician. \nYou have didactic skills and you can explain data analysis very well.\nYou are about to write a Jupyter book consisting of multiple Jupyter notebooks about a given topic.\n\nIn front of every code-cell, add a markdown cell with an explanation of the next code cell. Write 1-3 sentences in these markdown cells.\nWhen writing a notebook, always keep the code in the code cells concise. \nDo only one thing and let the user see the intermediate result.\nThen, continue with the next thing in a new code cell.\n\n\nIf you need an example image for image processing, use skimage.data.cells3d. In case you use it, add `pooch` to the list requirements.\n\n\nConfirm this with "ok".\n'},
 {'role': 'assistant', 'content': 'ok'},
 {'role': 'user', 'content': 'Hi, my name is Robert Haase.'},
 {'role': 'assistant',
  'content': "Hello Robert Haase! It's nice to meet you. I'm ready to assist you with creating a Jupyter book consisting of multiple Jupyter notebooks. As requested, I'll make sure to add explanatory markdown cells before each code cell, keep the code concise, and break down operations into separate cells for clarity. I'll also use the skimage.data.cells3d example for image processing tasks if needed. How can I help you get started with your Jupyter book project?"},
 {'role': 'user', 'content': 'What is my name?'},
 {'role': 'assistant', 'content': 'Your name is Robert Haase.'},
 {'role': 'user',
  'content': '\nI would like to teach others in Python basics and cover these aspects:\n\n* Introduction to Jupyter notebooks\n* Mathematical operations\n* Data Types: Lists, Tuples, Dictionaries\n* For-loops\n* Conditional statements\n* Custom functions\n* Image Processing with sckit-image\n* Tabular data wrangling with pandas\n* Plotting with seaborn\n* Random forest classifiers in scikit-learn\n\n\nTherefore, it would be great to have training material in the form of a Jupyter book.\n\nWhich Python libraries are relevant in this context? Do not write any Python code yet.\n'},
 {'role': 'assistant',
  'content': "For the topics you've mentioned, the following Python libraries are relevant:\n\n1. Jupyter: For creating and running interactive notebooks.\n2. NumPy: For mathematical operations and working with arrays.\n3. scikit-image: For image processing tasks.\n4. pandas: For tabular data manipulation and analysis.\n5. seaborn: For statistical data visualization.\n6. matplotlib: As the underlying library for seaborn and general plotting.\n7. scikit-learn: For machine learning tasks, including random forest classifiers.\n\nAdditionally, these standard Python libraries will be useful:\n\n8. random: For generating random numbers (useful in various examples).\n9. math: For additional mathematical functions.\n10. itertools: For advanced iteration tools.\n\nWhile not libraries, it's also worth mentioning that we'll be using Python's built-in data types (lists, tuples, dictionaries) and control structures (for-loops, conditional statements) extensively.\n\nFor the image processing section using scikit-image, we'll also need:\n\n11. pooch: For downloading and caching data files, as it's required for the skimage.data.cells3d example.\n\nThese libraries will cover all the aspects you want to teach in your Python basics course."},
 {'role': 'user',
  'content': "What would be a good filename for the 'Introduction to Jupyter notebooks' notebook? Make sure it contains no spaces and ends with .ipynb . Respond with the filename only."},
 {'role': 'assistant',
  'content': '01_introduction_to_jupyter_notebooks.ipynb'},
 {'role': 'user',
  'content': "What would be a good filename for the 'Mathematical operations' notebook? Make sure it contains no spaces and ends with .ipynb . Respond with the filename only."},
 {'role': 'assistant', 'content': '02_mathematical_operations.ipynb'},
 {'role': 'user',
  'content': "What would be a good filename for the 'Data Types: Lists, Tuples, Dictionaries' notebook? Make sure it contains no spaces and ends with .ipynb . Respond with the filename only."},
 {'role': 'assistant',
  'content': '03_data_types_lists_tuples_dictionaries.ipynb'},
 {'role': 'user',
  'content': "What would be a good filename for the 'For-loops' notebook? Make sure it contains no spaces and ends with .ipynb . Respond with the filename only."},
 {'role': 'assistant', 'content': '04_for_loops.ipynb'},
 {'role': 'user',
  'content': "What would be a good filename for the 'Conditional statements' notebook? Make sure it contains no spaces and ends with .ipynb . Respond with the filename only."},
 {'role': 'assistant', 'content': '05_conditional_statements.ipynb'},
 {'role': 'user',
  'content': "What would be a good filename for the 'Custom functions' notebook? Make sure it contains no spaces and ends with .ipynb . Respond with the filename only."},
 {'role': 'assistant', 'content': '06_custom_functions.ipynb'},
 {'role': 'user',
  'content': "What would be a good filename for the 'Image Processing with sckit-image' notebook? Make sure it contains no spaces and ends with .ipynb . Respond with the filename only."},
 {'role': 'assistant',
  'content': '07_image_processing_with_scikit_image.ipynb'},
 {'role': 'user',
  'content': "What would be a good filename for the 'Tabular data wrangling with pandas' notebook? Make sure it contains no spaces and ends with .ipynb . Respond with the filename only."},
 {'role': 'assistant',
  'content': '08_tabular_data_wrangling_with_pandas.ipynb'},
 {'role': 'user',
  'content': "What would be a good filename for the 'Plotting with seaborn' notebook? Make sure it contains no spaces and ends with .ipynb . Respond with the filename only."},
 {'role': 'assistant', 'content': '09_plotting_with_seaborn.ipynb'},
 {'role': 'user',
  'content': "What would be a good filename for the 'Random forest classifiers in scikit-learn' notebook? Make sure it contains no spaces and ends with .ipynb . Respond with the filename only."},
 {'role': 'assistant',
  'content': '10_random_forest_classifiers_scikit_learn.ipynb'},
 {'role': 'user',
  'content': '\nCreate a intro.md file for a jupyter book that contains all Jupyter notebooks we just created. \nThe introduction should give an overview in text form and with bullet points linking to the notebooks.\nMention that the entire book is AI-generated.\nThe repository url of the book is `https://github.com/generated-books/python-basics`.\nMention that the `generator.ipynb` file in the github repository contains all the code used for generating the book. Add a link to this file.\nRespond the content of this file only.\n'},
 {'role': 'assistant',
  'content': '# Introduction to Python Basics\n\nWelcome to this comprehensive guide on Python basics! This Jupyter book is designed to take you through the fundamental concepts of Python programming, from the very basics to more advanced topics like data analysis and machine learning.\n\nThis entire book is AI-generated, providing a structured and comprehensive approach to learning Python. The content has been created to offer clear explanations and practical examples for each topic.\n\n## Contents\n\nThis book covers the following topics:\n\n* [Introduction to Jupyter notebooks](01_introduction_to_jupyter_notebooks.ipynb)\n* [Mathematical operations](02_mathematical_operations.ipynb)\n* [Data Types: Lists, Tuples, Dictionaries](03_data_types_lists_tuples_dictionaries.ipynb)\n* [For-loops](04_for_loops.ipynb)\n* [Conditional statements](05_conditional_statements.ipynb)\n* [Custom functions](06_custom_functions.ipynb)\n* [Image Processing with scikit-image](07_image_processing_with_scikit_image.ipynb)\n* [Tabular data wrangling with pandas](08_tabular_data_wrangling_with_pandas.ipynb)\n* [Plotting with seaborn](09_plotting_with_seaborn.ipynb)\n* [Random forest classifiers in scikit-learn](10_random_forest_classifiers_scikit_learn.ipynb)\n\nEach notebook is designed to be self-contained, allowing you to focus on specific topics or work through the entire book sequentially.\n\n## About this Book\n\nThis book is part of an AI-generated series, created to provide accessible and comprehensive learning materials for various topics. The content is hosted on GitHub, and you can find the repository at:\n\n[https://github.com/generated-books/python-basics](https://github.com/generated-books/python-basics)\n\nFor those interested in the process behind this AI-generated book, you can view the code used for generation in the `generator.ipynb` file:\n\n[View Generator Notebook](https://github.com/generated-books/python-basics/blob/main/generator.ipynb)\n\nWe hope you find this resource helpful in your journey to learn Python. Happy coding!'},
 {'role': 'user',
  'content': '\nBuild a table of contents in Jupytyer book yml format.\nFirst, mention the intro.md file.\nPlease give me the list of all notebook filenames we just created. \nPut them in a _yml file for a Jupyter book.\nRespond the content of this file only.\n'},
 {'role': 'assistant',
  'content': '# Table of contents\n# Learn more at https://jupyterbook.org/customize/toc.html\n\nformat: jb-book\nroot: intro\nchapters:\n- file: 01_introduction_to_jupyter_notebooks\n- file: 02_mathematical_operations\n- file: 03_data_types_lists_tuples_dictionaries\n- file: 04_for_loops\n- file: 05_conditional_statements\n- file: 06_custom_functions\n- file: 07_image_processing_with_scikit_image\n- file: 08_tabular_data_wrangling_with_pandas\n- file: 09_plotting_with_seaborn\n- file: 10_random_forest_classifiers_scikit_learn'},
 {'role': 'user',
  'content': '\nA requirements.txt file in the `docs` folder containing all python libraries used in this Jupyter book.\nRespond the content of this file only.\n'},
 {'role': 'assistant',
  'content': 'jupyter-book\nnumpy\nscikit-image\npandas\nseaborn\nmatplotlib\nscikit-learn\npooch'},
 {'role': 'user',
  'content': '\nCreate a minimal config.yml file for the jupyter book.\nThe book will be uploaded to this github repository: https://github.com/generated-books/python-basics\nMake sure the notebooks will be executed when the book is built.\nThe icon for the book is saved in ../icon.png\nNote that today is June 23, 2024.\nRespond the content of this file only.\n'},
 {'role': 'assistant',
  'content': '# Book settings\ntitle: Python Basics\nauthor: AI-Generated\ncopyright: "2024"\nlogo: ../icon.png\n\n# Force re-execution of notebooks on each build.\n# See https://jupyterbook.org/content/execute.html\nexecute:\n  execute_notebooks: force\n\n# Define the name of the latex output file for PDF builds\nlatex:\n  latex_documents:\n    targetname: book.tex\n\n# Add a bibtex file so that we can create citations\nbibtex_bibfiles:\n  - references.bib\n\n# Information about where the book exists on the web\nrepository:\n  url: https://github.com/generated-books/python-basics\n  path_to_book: docs\n\n# Add GitHub buttons to your book\n# See https://jupyterbook.org/customize/config.html#add-a-link-to-your-repository\nhtml:\n  use_issues_button: true\n  use_repository_button: true\n\n# Launch button settings\nlaunch_buttons:\n  notebook_interface: classic\n  binderhub_url: ""\n\n# Date of last update\nlast_updated: "June 23, 2024"'},
 {'role': 'user',
  'content': '\nWrite a Github workflow file that builds the book and uploads the content to the gh_pages branch.\nThe book is stored in the `docs` folder of the respository.\nRespond the content of this file only.\n'},
 {'role': 'assistant',
  'content': "name: deploy-book\n\n# Only run this when the main branch changes\non:\n  push:\n    branches:\n    - main\n\n# This job installs dependencies, builds the book, and pushes it to `gh-pages`\njobs:\n  deploy-book:\n    runs-on: ubuntu-latest\n    steps:\n    - uses: actions/checkout@v2\n\n    # Install dependencies\n    - name: Set up Python 3.8\n      uses: actions/setup-python@v2\n      with:\n        python-version: 3.8\n\n    - name: Install dependencies\n      run: |\n        pip install -r docs/requirements.txt\n\n    # Build the book\n    - name: Build the book\n      run: |\n        jupyter-book build docs/\n\n    # Push the book's HTML to github-pages\n    - name: GitHub Pages action\n      uses: peaceiris/actions-gh-pages@v3.6.1\n      with:\n        github_token: ${{ secrets.GITHUB_TOKEN }}\n        publish_dir: docs/_build/html"},
 {'role': 'user',
  'content': '\nCreate a readme.md file for the jupyter book. \nGive instructions how to build the book.\nMention that the entire book is AI-generated. \nMention that the `generator.ipynb` file in the github repository contains all the code used for generating the book.\nRespond the content of this file only.\n'},
 {'role': 'assistant',
  'content': "# Python Basics: An AI-Generated Jupyter Book\n\nThis repository contains an AI-generated Jupyter Book on Python Basics. The book covers fundamental concepts of Python programming, from introductory topics to more advanced subjects like data analysis and machine learning.\n\n## About the Book\n\nThis Jupyter Book is entirely AI-generated, providing a structured and comprehensive approach to learning Python. The content has been created to offer clear explanations and practical examples for each topic.\n\nThe `generator.ipynb` file in this repository contains all the code used for generating this book. You can review this file to understand the process behind the AI-generated content.\n\n## Building the Book\n\nTo build the book locally, follow these steps:\n\n1. Clone this repository:\n   ```\n   git clone https://github.com/generated-books/python-basics.git\n   cd python-basics\n   ```\n\n2. Install the required dependencies:\n   ```\n   pip install -r docs/requirements.txt\n   ```\n\n3. Build the book:\n   ```\n   jupyter-book build docs/\n   ```\n\n4. The built book will be in the `docs/_build/html/` directory. You can open the `index.html` file in your web browser to view the book.\n\n## Online Version\n\nYou can also view the online version of this book at: https://generated-books.github.io/python-basics\n\n## Contributing\n\nAs this is an AI-generated book, we're not accepting direct contributions to the content. However, if you notice any issues or have suggestions for improvements, please open an issue in this repository.\n\n## License\n\nThis book is released under the [MIT License](LICENSE)."}]

This is just an approximation of the number of tokens in the chat history:

len(str(chat_history).split(" "))
1859