In this guide, you’ll learn how to set up and use Llama Stack—a standardized framework that simplifies AI application development. We’ll walk you through building the SambaNova distribution server, installing the client, and running your first model inference. Whether you’re prototyping or scaling up, this guide will help you get started quickly with best practices from the Llama ecosystem integrated into a modular, efficient architecture.
The following Python code demonstrates basic usage:
Copy
from llama_stack_client import LlamaStackClientLLAMA_STACK_PORT = 8321client = LlamaStackClient(base_url=f"http://localhost:{LLAMA_STACK_PORT}")# List all available modelsmodels = client.models.list()print("--- Available models: ---")for m in models: print(f"- {m.identifier}")print()# Choose a model from the listmodel = "sambanova/sambanova/Meta-Llama-3.3-70B-Instruct"# Run chat completionresponse = client.inference.chat_completion( messages=[ {"role": "system", "content": "You are a friendly assistant."}, {"role": "user", "content": "Write a two-sentence poem about llama."}, ], model_id=model,)print(response.completion_message.content)
This demonstrates the full client-server loop - connecting, listing models, and running inference.Explore the SambaNova Llama Stack integration repo to find several use cases using SambaNova distribution LLMs, Embeddings, tools, and agent adaptors .