Chat completion
The Chat completion API generates responses based on a given conversation. It supports both text-based and multimodal inputs.
Please see the Text generation capabilities document for additional usage information.
Endpoint
POST
https://api.sambanova.ai/v1/chat/completions
Request parameters
This section outlines the request parameters, including required and optional fields, along with structured examples for clarity.
Required parameters
Parameter | Type | Description |
---|---|---|
model | String | The name of the model to query. Refer to the Supported models list. |
messages | Array | The conversation history. Each message has a role and content . See message object structure for more details. |
Message object structure
Each message object within the messages
array consists of:
Field | Type | Description |
---|---|---|
role | String | The role of the message author. Choices: system , user , or assistant . |
content | Mixed | The message content. A string for text-only messages, or an array for multimodal content. See examples of string content and multimodal content. |
Example string content
Example multimodal content
Optional parameters
Parameter | Type | Description | Values |
---|---|---|---|
max_tokens | Integer | Maximum number of tokens to generate. Limited by model context length. | None |
temperature | Float | Controls randomness in response. Higher values increase randomness. | 0 to 1 |
top_p | Float | Adjusts token selection probability, ensuring dynamic response generation. | 0 to 1 |
top_k | Integer | Limits the number of token choices. | 1 to 100 |
stop | String, array, null | Specifies up to four sequences where the API should stop generating responses. This helps control output length. | Default: null |
stream | Boolean, null | Enables streaming responses when set to true . If false , the full response is returned after completion. | Default: false |
stream_options | Object, null | Specifies additional streaming options (only when stream: true ). Available option: include_usage: boolean . | Default: null |
tools | Array | Defines external tools the model can call (currently supports only functions). See tools parameter usage table. | None |
response_format | Object | Ensures output is valid JSON. Use { "type": "json_object" } for structured responses. | None |
tool_choice | String, object | Controls tool usage (auto , required , or specific function). See tools_choice value table | Default: auto |
Example usage of tools parameter
Type | Object fields | Description |
---|---|---|
Function | name (string ) | The name of the function to call. |
description (string ) | A short description of what the function does. | |
parameters (object ) | Defines the function parameters. | |
parameters.type (string ) | The data type of the parameters object (always "object" ). | |
parameters.properties (object ) | Defines the function parameters and their properties. | |
parameters.properties.<param_name> (object ) | Each function parameter is defined as an object with: type (data type) and description (description of the parameter). | |
parameters.required (array ) | A list of required parameters for the function. |
Accepted values for tool choice
Value | Description |
---|---|
auto | The model chooses between generating a message or calling a function. This is the default behavior when tool_choice is not specified. |
required | Forces the model to generate a function call. The model will always select one or more functions to call. |
Example requests
Below is a sample request body for a streaming response for a text model.
Example response format
The API returns a chat completion object , or a streamed sequence of chat completion chunk objects, if the request is streamed.
Chat completion response
Represents a chat completion response returned by model, based on the provided input.
Streaming response (chunked)
Represents a streaming response (chunked) returned by model, based on the provided input.
Response fields
The following table provides a list of key properties, parameter type, and description.
If a request fails, the response body provides a JSON object with details about the error.
For more information on errors, please see the API error codes page.
Property | Type | Description |
---|---|---|
id | String | A unique identifier for the chat completion. |
choices | Array | A list containing a single chat completion. |
created | Integer | The Unix timestamp (in seconds) of when the chat completion was created. Each chunk has the same timestamp. |
model | String | The model used to generate the completion. |
object | String | The object type, which is always chat.completion . |
usage | Object | An optional field present when stream_options: {"include_usage": true} is set. When present, it contains a null value except for the last chunk, which includes token usage statistics for the entire request. |
throughput_after_first_token | Float | The rate (tokens per second) at which output tokens are generated after the first token has been delivered. |
time_to_first_token | Float | The time (in seconds) the model takes to generate the first token. |
model_execution_time | Float | The time (in seconds) required to generate a complete response or all tokens. |
output_tokens_count | Integer | Number of tokens generated in the response. |
input_tokens_count | Integer | Number of tokens in the input prompt. |
total_tokens_count | Integer | The sum of input and output tokens. |
queue_time | Float | The time (in seconds) a request spends waiting in the queue before being processed by the model. |