Creates a model response for the given input. Only type: "function" tools are supported; other tool types are filtered server-side. SambaNova is stateless, conversation history must be supplied in full via input[] on each request.
Documentation Index
Fetch the complete documentation index at: https://sambanova-systems.mintlify.dev/docs/llms.txt
Use this file to discover all available pages before exploring further.
SambaNova API Key
Response creation parameters
responses request object
A plain text input equivalent to a user-role message.
Inserts a system (or developer) message as the first item in the model's context. Equivalent to a system-role message prepended to input[].
If true, the response is delivered as server-sent events (SSE).
Upper bound on the number of tokens the model may generate, including visible output tokens and reasoning tokens.
1024
Controls randomness in generation. Range: 0–2. It is recommended to alter this, top_p, or top_k but not more than one at a time.
0 <= x <= 20.7
Nucleus sampling cutoff. Range: 0–1. It is recommended to alter this, temperature, or top_k but not more than one at a time.
0 <= x <= 11
Limits sampling to the top K most probable tokens. It is recommended to alter this, top_p, or temperature but not more than one at a time.
1 <= x <= 1005
Number of top log-probability entries to return per output token. Null means log probabilities are not returned.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. Not currently implemented; accepted for API compatibility and echoed in the response.
-2 <= x <= 2Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics. Not currently implemented; accepted for API compatibility and echoed in the response.
-2 <= x <= 2Tools available to the model. Only type: "function" is supported; all other tool types are filtered server-side.
128Whether the model may issue multiple tool calls in parallel within one turn.
Maximum number of tool calls the model may make in a single response turn. Not currently implemented; accepted for API compatibility.
String shorthand for tool selection behavior.
none, auto, required Response format configuration. Supports plain text, json_object, and json_schema.
Reasoning configuration for models that support it. Ignored on non-reasoning models.
Included for API compatibility, but only echoed back in response
Included for API compatibility, but not supported
SambaNova is stateless - this field is accepted for API compatibility but has no effect. Always echoed back as false.
Accepted for API compatibility and echoed in the response. Context truncation behavior is not currently configurable via this field in SambaNova.
auto, disabled Not supported. SambaNova is stateless and does not maintain server-side conversation state. Accepted for API compatibility but ignored; clients must supply the full conversation history in input[].
Accepted for API compatibility and echoed back in the response. Has no effect on server behavior.
Accepted for API compatibility and echoed back in the response. Has no effect on server behavior.
Successful response. Returns a ResponseResponse object (non-streaming), or a stream of server-sent ResponseStreamEvent object events ending with a response.completed event (when stream: true).
A response object returned by POST /responses (non-streaming). Contains the model's output items, echoed input parameters, lifecycle metadata, and token usage.
Unique identifier for this response.
The object type. Always "response".
response Lifecycle status of the response. "completed" means the model finished successfully. "failed" means an error occurred during generation. "incomplete" means generation was cut short (e.g. max_output_tokens reached).
completed, failed, in_progress, incomplete Unix timestamp (seconds) when the response was created.
The model ID used to generate this response.
Ordered array of output items generated by the model. Items may be of type "message", "reasoning", or "function_call".
A message item. When used as input, id and status are optional. When present in output[], id and status are always set by the server. Role "assistant" with content type "output_text" represents a prior model turn; user/system/developer turns use content type "input_text". Plain string content is accepted in all roles on input.
{
"type": "message",
"role": "user",
"content": "What is the weather in San Francisco?"
}In-band error object present when status is "failed". Null when the response completed successfully.
Present when status is "incomplete". Describes why generation stopped before completion (e.g. max_output_tokens reached).
The temperature value used for this response.
The top_p value used for this response.
The frequency_penalty value echoed from the request. Not currently implemented; accepted for API compatibility
The presence_penalty value echoed from the request. Not currently implemented; accepted for API compatibility
Tool definitions available to the model for this response.
Whether parallel tool calls were enabled.
String shorthand for tool selection behavior.
none, auto, required The truncation value echoed from the request.
auto, disabled Whether background generation was requested.
The metadata echoed from the request.
Whether the response was stored server-side. SambaNova is stateless - always false.
The service tier used to process this request, as reported by the server.
The user, echoed back from request.
Unix timestamp (seconds) when the response finished generating.
Token usage statistics for this response.
{
"input_tokens": 248,
"output_tokens": 72,
"total_tokens": 320,
"input_tokens_details": { "cached_tokens": 0 },
"output_tokens_details": { "reasoning_tokens": 18 },
"start_time": 1737642515.445,
"end_time": 1737642515.904,
"time_to_first_token": 0.084,
"total_latency": 0.459,
"output_tokens_per_sec": 156.8,
"output_tokens_after_first_per_sec": 161.2,
"total_tokens_per_sec": 311.6,
"acceptance_rate": 4.06,
"is_last_response": true
}The system instructions echoed from the request, or null if none were provided.
The top_k value used for this response.
The max_tool_calls value echoed from the request.
The text format configuration (structured output mode) used for this response.
The reasoning configuration used for this response.
The max_output_tokens limit echoed from the request.
The top_logprobs value echoed from the request.
Not supported. Always null. SambaNova is stateless; use input[] to supply full conversation history.