In this guide, you’ll learn how to set up and use Llama Stack—a standardized framework that simplifies AI application development. We’ll walk you through building the SambaNova distribution server, installing the client, and running your first model inference. Whether you’re prototyping or scaling up, this guide will help you get started quickly with best practices from the Llama ecosystem integrated into a modular, efficient architecture.

Components of Llama Stack

Llama Stack includes two main components:

  • Server – A running distribution of Llama Stack that hosts various adaptors.
  • Client – A consumer of the server’s API, interacting with the hosted adaptors.

Get your SambaNova Cloud API key

  1. Create a SambaNova Cloud account.
  2. Navigate to the API key section.
  3. Generate a new key (if you don’t already have one).
  4. Copy and store key securely

Build the SambaNova Llama Stack server

  • Set up a Python virtual environment
python -m venv .venv
source .venv/bin/activate
  • Install required dependencies
pip install uv
pip install llama-stack
  • Build the SambaNova distribution image
mkdir -p ~/.llama
llama stack build --template sambanova --image-type container
  • Verify Docker image creation
docker image list
  • Example output
REPOSITORY                        TAG       IMAGE ID       CREATED          SIZE
distribution-sambanova            0.2.6     4f70c8f71a21   5 minutes ago   2.4GB

Run the SambaNova distribution server

  • Export required environment variables
export LLAMA_STACK_PORT=8321
export SAMBANOVA_API_KEY="your-api-key-here"
  • Run the server with Docker
docker run -it \
  -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
  -v ~/.llama:/root/.llama \
  distribution-sambanova:0.2.6 \  # Match this with your built image tag
  --port $LLAMA_STACK_PORT \
  --env SAMBANOVA_API_KEY=$SAMBANOVA_API_KEY

Install the Llama Stack client

In the same or another environment, run:

  pip install llama-stack-client

Use the client to interact with the server

The following Python code demonstrates basic usage:

from llama_stack_client import LlamaStackClient

LLAMA_STACK_PORT = 8321
client = LlamaStackClient(base_url=f"http://localhost:{LLAMA_STACK_PORT}")

# List all available models
models = client.models.list()
print("--- Available models: ---")
for m in models:
    print(f"- {m.identifier}")
print()

# Choose a model from the list
model = "sambanova/Meta-Llama-3.3-70B-Instruct"

# Run chat completion
response = client.inference.chat_completion(
    messages=[
        {"role": "system", "content": "You are a friendly assistant."},
        {"role": "user", "content": "Write a two-sentence poem about llama."},
    ],
    model_id=model,
)

print(response.completion_message.content)

This demonstrates the full client-server loop - connecting, listing models, and running inference.

Explore the SambaNova Llama Stack integration repo to find several use cases using SambaNova distribution LLMs, Embeddings, tools, and agent adaptors .

Llama Stack documentation

Refer to the Llama Stack docs to:

  • Understand core concepts
  • Dive into sample apps
  • Learn how to extend and customize the framework