The Chat completion API generates responses based on a given conversation. It supports both text-based and multimodal inputs.

Please see the Text generation capabilities document for additional usage information.

Endpoint

POST https://api.sambanova.ai/v1/chat/completions

Request parameters

This section outlines the request parameters, including required and optional fields, along with structured examples for clarity.

Required parameters

ParameterTypeDescription
modelStringThe name of the model to query. Refer to the Supported models list.
messagesArrayThe conversation history. Each message has a role and content. See message object structure for more details.

Message object structure

Each message object within the messages array consists of:

FieldTypeDescription
roleStringThe role of the message author. Choices: system, user, or assistant.
contentMixedThe message content. A string for text-only messages, or an array for multimodal content. See examples of string content and multimodal content.

Example string content

"content": "Answer the question in a couple sentences."

Example multimodal content

[
  { "type": "text", "text": "What's in this image?" },
  { "type": "image_url", "image_url": { "url": "base64 encoded string of image" } }
]

Optional parameters

ParameterTypeDescriptionValues
max_tokensIntegerMaximum number of tokens to generate. Limited by model context length.None
temperatureFloatControls randomness in response. Higher values increase randomness.0 to 1
top_pFloatAdjusts token selection probability, ensuring dynamic response generation.0 to 1
top_kIntegerLimits the number of token choices.1 to 100
stopString, array, nullSpecifies up to four sequences where the API should stop generating responses. This helps control output length.Default: null
streamBoolean, nullEnables streaming responses when set to true. If false, the full response is returned after completion.Default: false
stream_optionsObject, nullSpecifies additional streaming options (only when stream: true). Available option: include_usage: boolean.Default: null
toolsArrayDefines external tools the model can call (currently supports only functions). See tools parameter usage table.None
response_formatObjectEnsures output is valid JSON. Use { "type": "json_object" } for structured responses.None
tool_choiceString, objectControls tool usage (auto, required, or specific function). See tools_choice value tableDefault: auto

Example usage of tools parameter

TypeObject fieldsDescription
Functionname (string)The name of the function to call.
description (string)A short description of what the function does.
parameters (object)Defines the function parameters.
parameters.type (string)The data type of the parameters object (always "object").
parameters.properties (object)Defines the function parameters and their properties.
parameters.properties.<param_name> (object)Each function parameter is defined as an object with: type (data type) and description (description of the parameter).
parameters.required (array)A list of required parameters for the function.

Accepted values for tool choice

ValueDescription
autoThe model chooses between generating a message or calling a function. This is the default behavior when tool_choice is not specified.
requiredForces the model to generate a function call. The model will always select one or more functions to call.

Example requests

Below is a sample request body for a streaming response for a text model.

Example text model request
{
   "messages": [
      {"role": "system", "content": "Answer the question in a couple sentences."},
      {"role": "user", "content": "Share a happy story with me"}
   ],
   "max_tokens": 800,
   "stop": ["[INST", "[INST]", "[/INST]", "[/INST]"],
   "model": "Meta-Llama-3.1-8B-Instruct",
   "stream": true, 
   "stream_options": {"include_usage": true}
}

Example response format

The API returns a chat completion object , or a streamed sequence of chat completion chunk objects, if the request is streamed.

Chat completion response

Represents a chat completion response returned by model, based on the provided input.

Chat completing response
{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "Llama-3-8b-chat",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "\n\nHello there, how may I assist you today?",
    },
    "logprobs": null,
    "finish_reason": "stop"
  }]
}

Streaming response (chunked)

Represents a streaming response (chunked) returned by model, based on the provided input.

Streaming chat response (chunked)
{
  "id": "chatcmpl-123",
  "object": "chat.completion.chunk",
  "created": 1694268190,
  "model": "Llama-3-8b-chat",
  "system_fingerprint": "fp_44709d6fcb",
  "choices": [
    {
      "index": 0,
      "delta": {},
      "logprobs": null,
      "finish_reason": "stop"
    }
  ]
}

Response fields

The following table provides a list of key properties, parameter type, and description.

If a request fails, the response body provides a JSON object with details about the error.

For more information on errors, please see the API error codes page.

PropertyTypeDescription
idStringA unique identifier for the chat completion.
choicesArrayA list containing a single chat completion.
createdIntegerThe Unix timestamp (in seconds) of when the chat completion was created. Each chunk has the same timestamp.
modelStringThe model used to generate the completion.
objectStringThe object type, which is always chat.completion.
usageObjectAn optional field present when stream_options: {"include_usage": true} is set. When present, it contains a null value except for the last chunk, which includes token usage statistics for the entire request.
throughput_after_first_tokenFloatThe rate (tokens per second) at which output tokens are generated after the first token has been delivered.
time_to_first_tokenFloatThe time (in seconds) the model takes to generate the first token.
model_execution_timeFloatThe time (in seconds) required to generate a complete response or all tokens.
output_tokens_countIntegerNumber of tokens generated in the response.
input_tokens_countIntegerNumber of tokens in the input prompt.
total_tokens_countIntegerThe sum of input and output tokens.
queue_timeFloatThe time (in seconds) a request spends waiting in the queue before being processed by the model.