Llama Stack
In this guide, you’ll learn how to set up and use Llama Stack—a standardized framework that simplifies AI application development. We’ll walk you through building the SambaNova distribution server, installing the client, and running your first model inference. Whether you’re prototyping or scaling up, this guide will help you get started quickly with best practices from the Llama ecosystem integrated into a modular, efficient architecture.
Components of Llama Stack
Llama Stack includes two main components:
- Server – A running distribution of Llama Stack that hosts various adaptors.
- Client – A consumer of the server’s API, interacting with the hosted adaptors.
Get your SambaNova Cloud API key
- Create a SambaNova Cloud account.
- Navigate to the API key section.
- Generate a new key (if you don’t already have one).
- Copy and store key securely
Build the SambaNova Llama Stack server
- Set up a Python virtual environment
- Install required dependencies
- Build the SambaNova distribution image
- Verify Docker image creation
- Example output
Run the SambaNova distribution server
- Export required environment variables
- Run the server with Docker
Install the Llama Stack client
In the same or another environment, run:
Use the client to interact with the server
The following Python code demonstrates basic usage:
This demonstrates the full client-server loop - connecting, listing models, and running inference.
Explore the SambaNova Llama Stack integration repo to find several use cases using SambaNova distribution LLMs, Embeddings, tools, and agent adaptors .
Llama Stack documentation
Refer to the Llama Stack docs to:
- Understand core concepts
- Dive into sample apps
- Learn how to extend and customize the framework