Chat completion

The Chat completion API generates responses based on a given conversation. It supports both text-based and multimodal inputs.

Please see the Text generation capabilities document for additional usage information.

Endpoint

POST https://api.sambanova.ai/v1/chat/completions

Request parameters

The following table outlines the parameters required to make a chat completion request, parameter type, and description.

Required parameters

Parameter	Type	Description
`model`	String	The name of the model to query. Refer to the Supported models list.
`messages`	Array	The conversation history. Each message has a `role` and `content`. See message object structure for more details.

Message object structure

Each message object within the messages array consists of role and content.

Field	Type	Description
`role`	String	The role of the message author. Choices: `system`, `user`, or `assistant`.
`content`	Mixed	The message content. A string for text-only messages, or an array for multimodal content. See examples of string content and multimodal content.

Example string content

"content": "Answer the question in a couple sentences."

Example multimodal content

[
  { "type": "text", "text": "What's in this image?" },
  { "type": "image_url", "image_url": { "url": "base64 encoded string of image" } }
]

Optional parameters

The following table outlines the optional parameters that can be used to fine-tune the model’s behavior. You can see the parameter type, description, and default values.

Parameter	Type	Description	Values
`max_tokens`	Integer	Maximum number of tokens to generate. Limited by model context length.	None
`temperature`	Float	Controls randomness in response. Higher values increase randomness.	`0` to `1`
`top_p`	Float	Adjusts token selection probability, ensuring dynamic response generation.	`0` to `1`
`top_k`	Integer	Limits the number of token choices.	`1` to `100`
`stop`	String, array, null	Specifies up to four sequences where the API should stop generating responses. This helps control output length.	Default: `null`
`stream`	Boolean, null	Enables streaming responses when set to `true`. If `false`, the full response is returned after completion.	Default: `false`
`stream_options`	Object, null	Specifies additional streaming options (only when `stream: true`). Available option: `include_usage: boolean`.	Default: `null`

Function calling parameters

Models that support function calling will have the following three parameters available to use. You can find detailed information about these parameters and supported models on the function calling page.

Parameter	Type	Description	Values
`tools`	Array	Defines external tools the model can call (currently supports only functions). See tools parameter usage table.	None
`response_format`	Object	Ensures output is valid JSON. Use `{ "type": "json_object" }` for structured responses. Use `{"type":"json_schema","json_schema":{..}`to enable structured outputs which ensures the model will match your supplied JSON schema.	None
`tool_choice`	String, object	Controls tool usage (`auto`, `required`, or specific function). See tool_choice value table.	Default: `auto`

Example usage of tools parameter

The following table outlines the structure of the tools parameter.

Type	Object fields	Description
Function	`name` (`string`)	The name of the function to call.
	`description` (`string`)	A short description of what the function does.
	`parameters` (`object`)	Defines the function parameters.
	`parameters.type` (`string`)	The data type of the parameters object (always `"object"`).
	`parameters.properties` (`object`)	Defines the function parameters and their properties.
	`parameters.properties.<param_name>` (`object`)	Each function parameter is defined as an object with: `type` (data type) and `description` (description of the parameter).
	`parameters.required` (`array`)	A list of required parameters for the function.

Accepted values for tool choice

The following table illustrates how the tool_choice parameter controls the model’s interaction with external functions.

Value	Description
`auto`	The model chooses between generating a message or calling a function. This is the default behavior when `tool_choice` is not specified.
`required`	Forces the model to generate a function call. The model will always select one or more functions to call.
`function`	To enforce a specific function call, set tool_choice = `{'"type": "function", "function": {"name": "solve_quadratic"}}`. This ensures the model will only use the specified function.

Example requests

Below is a sample request body for a streaming response for a text model.

Example text model request

{
   "messages": [
      {"role": "system", "content": "Answer the question in a couple sentences."},
      {"role": "user", "content": "Share a happy story with me"}
   ],
   "max_tokens": 800,
   "stop": ["[INST", "[INST]", "[/INST]", "[/INST]"],
   "model": "Meta-Llama-3.1-8B-Instruct",
   "stream": true, 
   "stream_options": {"include_usage": true}
}

Example response format

The API returns a chat completion object , or a streamed sequence of chat completion chunk objects, if the request is streamed.

Chat completion response

Represents a chat completion response returned by model, based on the provided input.

Chat completing response

{
    "id": "chatcmpl-123",
    "object": "chat.completion",
    "created": 1677652288,
    "model": "Llama-3-8b-chat",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "\n\nHello there, how may I assist you today?",
            },
            "logprobs": null,
            "finish_reason": "stop",
        }
    ],
}

Streaming response (chunked)

Represents a streaming response (chunked) returned by model, based on the provided input.

Streaming chat response (chunked)

{
  "id": "chatcmpl-123",
  "object": "chat.completion.chunk",
  "created": 1694268190,
  "model": "Llama-3-8b-chat",
  "system_fingerprint": "fp_44709d6fcb",
  "choices": [
    {
      "index": 0,
      "delta": {},
      "logprobs": null,
      "finish_reason": "stop"
    }
  ]
}

Response fields

The following table provides a list of key properties, parameter type, and description.

If a request fails, the response body provides a JSON object with details about the error. For more information on errors, please see the API error codes page.

Property	Type	Description
`id`	String	A unique identifier for the chat completion.
`choices`	Array	A list containing a single chat completion.
`created`	Integer	The Unix timestamp (in seconds) of when the chat completion was created. Each chunk has the same timestamp.
`model`	String	The model used to generate the completion.
`object`	String	The object type, which is always `chat.completion`.
`usage`	Object	An optional field present when `stream_options: {"include_usage": true}` is set. When present, it contains a null value except for the last chunk, which includes token usage statistics for the entire request.
`throughput_after_first_token`	Float	The rate (tokens per second) at which output tokens are generated after the first token has been delivered.
`time_to_first_token`	Float	The time (in seconds) the model takes to generate the first token.
`model_execution_time`	Float	The time (in seconds) required to generate a complete response or all tokens.
`output_tokens_count`	Integer	Number of tokens generated in the response.
`input_tokens_count`	Integer	Number of tokens in the input prompt.
`total_tokens_count`	Integer	The sum of input and output tokens.
`queue_time`	Float	The time (in seconds) a request spends waiting in the queue before being processed by the model.

Endpoints

Using the API

Endpoint

Request parameters

Required parameters

Message object structure

Example string content

Example multimodal content

Optional parameters

Function calling parameters

Example usage of tools parameter

Accepted values for tool choice

Example requests

Example response format

Chat completion response

Streaming response (chunked)

Response fields

Endpoints

Using the API

​Endpoint

​Request parameters

​Required parameters

​Message object structure

​Example string content

​Example multimodal content

​Optional parameters

​Function calling parameters

​Example usage of tools parameter

​Accepted values for tool choice

​Example requests

​Example response format

​Chat completion response

​Streaming response (chunked)

​Response fields

Endpoint

Request parameters

Required parameters

Message object structure

Example string content

Example multimodal content

Optional parameters

Function calling parameters

Example usage of tools parameter

Accepted values for tool choice

Example requests

Example response format

Chat completion response

Streaming response (chunked)

Response fields