Skip to main content

Ctrl+K

Site Navigation

Setting up your computer
Prompting basics
Accessing LLMs
Chatbots
Function / Tool calling

Image generation

Image manipulation

Generating Videos, Books, Slides and online content

Synthesizing data

Code generation

Vision language models

Auto-generating PowerPoint files with chatGPT and Dall-E

Retrieval Augmented Generation

Solving github issues

Model Fine-Tuning in the cloud

Model Fine-Tuning locally

Benchmarking Vision Language Models

Site Navigation

Setting up your computer
Prompting basics
Accessing LLMs
Chatbots
Function / Tool calling

Image generation

Image manipulation

Generating Videos, Books, Slides and online content

Synthesizing data

Code generation

Vision language models

Auto-generating PowerPoint files with chatGPT and Dall-E

Retrieval Augmented Generation

Solving github issues

Model Fine-Tuning in the cloud

Model Fine-Tuning locally

Benchmarking Vision Language Models

Ctrl+K

Generative Artificial Intelligence Notebooks

Setup

Setting up your computer
- Installation instructions for Scientific Computing Uni Leipzig (paula)

LLM basics

Prompting basics
Accessing LLMs
Chatbots
- Programming an LLM-based chatbot
- A Chatbot GUI
Function / Tool calling
- Function calling using ollama
- Function calling using ScaDS.AI’s LLM service

Multi-Modal LLMs

Image generation
Image manipulation
Generating Videos, Books, Slides and online content
- Video generation
Synthesizing data
- Generating synthetic customer data
- Combining LLMs with Random number generators for data generation
Code generation
Vision language models

Advanced Prompt Engineering

Auto-generating PowerPoint files with chatGPT and Dall-E
Retrieval Augmented Generation
Chat with Docs
Solving github issues
Agents
Model Fine-Tuning in the cloud
Model Fine-Tuning locally
Benchmarking
Benchmarking Vision Language Models

Links

Imprint

repository
open issue

.ipynb

Text embeddings

Contents

Exercise

Text embeddings#

A text embedding is a high-dimensional latent space where words or phrases are represented as vectors. In this notebook we will determine the vectors for a couple of words. For visualization purposes, we will apply principal-component-analysis to these vectors and display the relationship of the words in two-dimensional space.

import openai
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA

This helper functions will serve us to determine the vector for each word / text.

from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-base-en-v1.5")

def embed(text):
    return embed_model._get_text_embedding(text)

vector = embed("Hello world")
vector[:3]

[0.010724308900535107, 0.05578264966607094, 0.027084074914455414]

len(vector)

words = ["dog", "cat", "hamster", "guinea pig", "pet", "microscope"]

# Example of input dictionary
object_coords = {word: embed(word) for word in words}

# Extract names and numerical lists
names = list(object_coords.keys())
data_matrix = np.array(list(object_coords.values()))

# Apply PCA
pca = PCA(n_components=2)  # Reduce to 2 components for visualization
transformed_data = pca.fit_transform(data_matrix)

# Create scatter plot
plt.figure(figsize=(3, 3))
plt.scatter(transformed_data[:, 0], transformed_data[:, 1])

# Annotate data points with names
for i, name in enumerate(names):
    plt.annotate(name, (transformed_data[i, 0], transformed_data[i, 1]))

plt.title('PCA of word embedding')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.show()

../_images/2f43338a77259b130e3461b95772fd064e90abc24b4f2754de9927fc10b06d3b.png

Exercise#

Draw an embedding for words such as “segmentation”, “thresholding”, “filtering”, “convolution”, “denoising”, “measuring”, “plotting”.

Could you predict how the words are placed in this space?
Draw the same embedding again - is the visualization repeatable?
Change the order of the words in the list. Would you expect the visualization to change?
Add words such as “banana”, “apple”, “orange”. Could you predict the view?

previous

Retrieval Augmented Generation

next

OpenAI Embeddings

On this page

Exercise

By Robert Haase

Last updated on 2025-08-07.

Copyright: Licensed CC-BY 4.0 unless mentioned otherwise. Contributions and feedback are welcome.