Huggingface Serverless Inference

Huggingface Serverless Inference#

You can also access LLMs using Huggingface Serverless Inference Endpoints. You will need a huggingface token for this. You can search for available models and explore the rate limits.

We can define such a helper function:

def prompt_hf_inf(prompt, model="google/gemma-7b"):
    import os
    from huggingface_hub import InferenceClient
    
    client = InferenceClient(token=os.environ["HF_TOKEN"])
    
    return client.text_generation(prompt,
        model=model,
        temperature=0.7,
        top_p=0.95,
        stop_sequences=["\n\n"] 
    )

And then call it like this.

print(prompt_hf_inf("""
# A story about the rabbit in the forest
Once apon a time, the rabbit decided to 
"""))

go on a journey. The rabbit decided to go 
on a journey in order to see how the forest
was. The rabbit decided to go on the journey
because he was bored. The rabbit went on the
journey and saw a bunch of trees. The rabbit 
saw a bunch of trees and was very scared. The
rabbit saw the trees and was scared because the
trees could eat him. The rabbit saw the trees 
and was scared because they were big and scary

print(prompt_hf_inf("1+3="))

print(prompt_hf_inf("The purpose of life is"))

 to be happy. In our quest for happiness, we often get caught up in the external world. We try to keep up with the Joneses, buying the latest gadgets, traveling to exotic destinations, and trying to keep up with the latest fashions.