Skip to main content

Documentation Index

Fetch the complete documentation index at: https://sambanova-systems.mintlify.dev/docs/llms.txt

Use this file to discover all available pages before exploring further.

SambaNova inference APIs are designed to be compliant with OpenAI client libraries to simplify the adoption of SambaNova inference technologies to enhance your AI applications.

Download the library

Run the command below to download the library.
pip install openai

Use SambaNova APIs with OpenAI client libraries

Configure your OpenAI client libraries to use SambaNova inference APIs by setting two values: the base_url and your api_key.
Don’t have a SambaNova API key? Get yours from the API keys and URLs page.
from openai import OpenAI

client = OpenAI(
    base_url="your-sambanova-base-url",
    api_key="your-sambanova-api-key"
)
Now you can make an API request to a model and choose how to receive your output.

Non-streaming example

The following code demonstrates using the OpenAI Python client for non-streaming completions.
completion = client.chat.completions.create(
  model="Meta-Llama-3.3-70B-Instruct",
  messages = [
      {"role": "system", "content": "Answer the question in a couple sentences."},
      {"role": "user", "content": "Share a happy story with me"}
    ]
)

print(completion.choices[0].message)

Streaming example

The following code demonstrates using the OpenAI Python client for streaming completions.
completion = client.chat.completions.create(
  model="Meta-Llama-3.3-70B-Instruct",
  messages = [
      {"role": "system", "content": "Answer the question in a couple sentences."},
      {"role": "user", "content": "Share a happy story with me"}
    ],
  stream= True
)

for chunk in completion:
  print(chunk.choices[0].delta.content)
In streaming mode, the API returns chunks that contain multiple tokens. When calculating metrics like tokens per second or time per output token, count all tokens in each chunk.

Responses API

In addition to the Chat Completions endpoint, SambaNova exposes a Responses API endpoint POST /v1/responses that is compatible with the OpenAI Responses API standard. If you already use the OpenAI SDK with the Responses API, you can point it to SambaNova with the same configuration.
response = client.responses.create(
    model="gpt-oss-120b",
    input="Explain the difference between supervised and unsupervised learning in two sentences."
)

print(response.output[0].content[0].text)
For full details on supported parameters, tool calling, streaming, and structured output with the Responses API, see Responses API.

Currently unsupported OpenAI features

The following features are not yet supported and will be ignored:
  • presence_penalty
  • frequency_penalty
  • logit_bias

Feature differences

n: The SambaNova API supports values 1–8 (integer, default 1). OpenAI has no documented hard cap. Note that n greater than 1 is not supported when using function calling or tools — combining them returns a 400 error. seed: Supported on both SambaStack and SambaCloud for text generation models. Passing the same seed with the same inputs produces deterministic outputs. Accepts any integer, including negative values.
Unlike OpenAI, the SambaNova API does not return a system_fingerprint field in the response. The seed parameter is not supported for multi-modality models or continuous batching (CB) models.

SambaNova API features not supported by OpenAI clients

The SambaNova API supports the top_k parameter, which is not supported by the OpenAI client libraries.