Kiara LLM endpoint#
In this notebook we will use yet experimental LLM infrastructure infrastructure. To use it, you must enter two enviroment variables KIARA_API_KEY
and KIARA_LLM_SERVER
. Also this method uses the OpenAI API and we just change the base_url
.
import os
import openai
openai.__version__
'1.90.0'
def prompt_kiara(message:str, model="ollama-llama3-3-70b"):
"""A prompt helper function that sends a message to kiara LLM server
and returns only the text response.
"""
import os
# convert message in the right format if necessary
if isinstance(message, str):
message = [{"role": "user", "content": message}]
# setup connection to the LLM
client = openai.OpenAI(base_url=os.environ.get('KIARA_LLM_SERVER') + "api/",
api_key=os.environ.get('KIARA_API_KEY')
)
response = client.chat.completions.create(
model=model,
messages=message
)
# extract answer
return response.choices[0].message.content
prompt_kiara("Hi!")
"It's nice to meet you. Is there something I can help you with, or would you like to chat?"
Exercise#
List the models available in the endpoint and try them out by specifying them when calling prompt_scadsai_llm()
.
client = openai.OpenAI(base_url=os.environ.get('KIARA_LLM_SERVER') + "api/",
api_key=os.environ.get('KIARA_API_KEY'))
print("\n".join([model.id for model in client.models.list().data]))
ollama-llama3-3-70b
vllm-baai-bge-m3
vllm-deepseek-coder-33b-instruct
vllm-deepseek-r1-distill-llama-70b
vllm-llama-3-3-nemotron-super-49b-v1
vllm-llama-4-scout-17b-16e-instruct
vllm-meta-llama-llama-3-3-70b-instruct
vllm-mistral-small-24b-instruct-2501
vllm-multilingual-e5-large-instruct
vllm-nvidia-llama-3-3-70b-instruct-fp8