Kiara LLM endpoint#

In this notebook we will use yet experimental LLM infrastructure infrastructure. To use it, you must enter two enviroment variables KIARA_API_KEY and KIARA_LLM_SERVER. Also this method uses the OpenAI API and we just change the base_url.

import os
import openai
openai.__version__
'1.90.0'
def prompt_kiara(message:str, model="ollama-llama3-3-70b"):
    """A prompt helper function that sends a message to kiara LLM server 
    and returns only the text response.
    """
    import os
    
    # convert message in the right format if necessary
    if isinstance(message, str):
        message = [{"role": "user", "content": message}]
    
    # setup connection to the LLM
    client = openai.OpenAI(base_url=os.environ.get('KIARA_LLM_SERVER') + "api/",
                           api_key=os.environ.get('KIARA_API_KEY')
    )
    
    response = client.chat.completions.create(
        model=model,
        messages=message
    )
    
    # extract answer
    return response.choices[0].message.content
prompt_kiara("Hi!")
"It's nice to meet you. Is there something I can help you with, or would you like to chat?"

Exercise#

List the models available in the endpoint and try them out by specifying them when calling prompt_scadsai_llm().

client = openai.OpenAI(base_url=os.environ.get('KIARA_LLM_SERVER') + "api/",
                       api_key=os.environ.get('KIARA_API_KEY'))

print("\n".join([model.id for model in client.models.list().data]))
ollama-llama3-3-70b
vllm-baai-bge-m3
vllm-deepseek-coder-33b-instruct
vllm-deepseek-r1-distill-llama-70b
vllm-llama-3-3-nemotron-super-49b-v1
vllm-llama-4-scout-17b-16e-instruct
vllm-meta-llama-llama-3-3-70b-instruct
vllm-mistral-small-24b-instruct-2501
vllm-multilingual-e5-large-instruct
vllm-nvidia-llama-3-3-70b-instruct-fp8