Responses Implementation Guide - SambaNova Documentation

The SambaNova Responses API (POST /v1/responses) is compatible with the OpenAI Responses API standard. Existing clients and SDKs built against the OpenAI Responses API can point to SambaNova’s endpoint with no code changes. This endpoint is designed for agentic, tool-capable, and coding-oriented integrations. The Responses API complements the existing Chat Completions API and does not replace it.

Supported models

gpt-oss-120b
MiniMax-M2.7

Additional models will be added in future releases.

To get better quality in tool calling requests with gpt-oss-120b, set the reasoning_effort to high.

How it works

The Responses API structures model output as typed output items — message, function_call, and reasoning — rather than a single assistant text field. Each request returns a response object containing one or more of these items, depending on the model’s behavior.

Key characteristics

Stateless: SambaNova does not store conversation state. To continue a multi-turn conversation, include the relevant prior output items in the input array of each subsequent request. Client-executed tools only: When a tool is needed, the model returns a function_call item. Your application executes the function and returns the result in a follow-up request. Built-in and server-executed tools are not supported. Structured streaming: Streaming responses use typed Server-Sent Events (SSE) with an event → item → content → delta event hierarchy. Structured output: The text.format field supports json_schema and json_object format types. Reasoning support: Reasoning-capable models expose reasoning content via a reasoning output item.

Usage

Simple generation

from sambanova import SambaNova
client = SambaNova(
    base_url="your-sambanova-base-url",
    api_key="your-sambanova-api-key"
)
response = client.responses.create(
    model="gpt-oss-120b",
    input="Explain the difference between supervised and unsupervised learning in two sentences."
)
print(response.output[0].content[0].text)
# or
print(response.output_text)

Streaming response

Set stream=True to receive typed SSE events as the response is generated.

from sambanova import SambaNova
client = SambaNova(
    base_url="your-sambanova-base-url",
    api_key="your-sambanova-api-key"
)
stream = client.responses.create(
    model="gpt-oss-120b",
    input="Write a short poem about speed.",
    stream=True
)
for event in stream:
    if event.type == "response.output_text.delta":
        print(event.delta, end="", flush=True)

Function calling

When tools are provided, the model may return a function_call item instead of a message. Your application is responsible for executing the function and returning the result. Step 1: Send a request with tools defined

from sambanova import SambaNova
client = SambaNova(
    base_url="your-sambanova-base-url",
    api_key="your-sambanova-api-key"
)
tools = [
    {
        "type": "function",
        "name": "get_weather",
        "description": "Get the current weather for a city.",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {
                    "type": "string",
                    "description": "The city name, e.g. San Francisco"
                }
            },
            "required": ["city"]
        }
    }
]
response = client.responses.create(
    model="gpt-oss-120b",
    input=[{"role": "user", "content": "What is the weather in Bogotá?"}],
    tools=tools
)
for item in response.output:
    if item.type == "function_call":
        print(f"Tool: {item.name}, Arguments: {item.arguments}")

Step 2: Execute the tool and return the result (uses client, response, and tools from Step 1)

import json
def get_weather(city: str) -> dict:
    # Replace with a real weather API call
    return {"city": city, "temperature_celsius": 18, "condition": "Cloudy"}
# Find the function_call item in the response
tool_call = next(item for item in response.output if item.type == "function_call")
args = json.loads(tool_call.arguments)
result = get_weather(args["city"])
# Submit the tool result in a follow-up request
follow_up = client.responses.create(
    model="gpt-oss-120b",
    input=[
        {"role": "user", "content": "What is the weather in Bogotá?"},
        tool_call,
        {
            "type": "function_call_output",
            "call_id": tool_call.call_id,
            "output": json.dumps(result)
        }
    ],
    tools=tools
)
print(follow_up.output[0].content[0].text)

Structured output

Use the text.format field to constrain the response to a JSON schema.

import json
from sambanova import SambaNova
client = SambaNova(
    base_url="your-sambanova-base-url",
    api_key="your-sambanova-api-key"
)
response = client.responses.create(
    model="gpt-oss-120b",
    input="Extract the event details: SambaNova launch on May 1, 2026 at 10am in San Francisco.",
    text={
        "format": {
            "type": "json_schema",
            "name": "event_extraction",
            "schema": {
                "type": "object",
                "properties": {
                    "title": {"type": "string"},
                    "date": {"type": "string"},
                    "time": {"type": "string"},
                    "location": {"type": "string"}
                },
                "required": ["title", "date", "time", "location"],
                "additionalProperties": False
            }
        }
    }
)
print(json.loads(response.output[0].content[0].text))

Multi-turn conversations

The Responses API is stateless. To maintain context across multiple turns, include all prior output items alongside the new user message in the input array of each subsequent request.

from sambanova import SambaNova
client = SambaNova(
    base_url="your-sambanova-base-url",
    api_key="your-sambanova-api-key"
)
# First turn
response_1 = client.responses.create(
    model="gpt-oss-120b",
    input=[{"role": "user", "content": "What is the capital of Colombia?"}]
)
assistant_message = response_1.output[0]
# Second turn — include prior output items
response_2 = client.responses.create(
    model="gpt-oss-120b",
    input=[
        {"role": "user", "content": "What is the capital of Colombia?"},
        assistant_message,
        {"role": "user", "content": "What is its approximate population?"}
    ]
)
print(response_2.output[0].content[0].text)

Limitations

The previous_response_id parameter is not supported. The SambaNova Responses API is stateless — manage conversation history by including prior output items in the input array.
Built-in and server-executed tools are not supported. Only client-executed function tools are available.

Documentation Index

​Supported models

​How it works

​Key characteristics

​Usage

​Simple generation

​Streaming response

​Function calling

​Structured output

​Multi-turn conversations

​Limitations

Supported models

How it works

Key characteristics

Usage

Simple generation

Streaming response

Function calling

Structured output

Multi-turn conversations

Limitations