POST
/
v1
/
chat
/
completions
curl --request POST \
  --url https://api.sambanovacloud.com/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "messages": [
    {
      "role": "system",
      "content": "Answer the question in a couple sentences."
    },
    {
      "role": "user",
      "content": "Share a happy story with me"
    }
  ],
  "max_tokens": 800,
  "stop": [
    "[INST",
    "[INST]",
    "[/INST]",
    "[/INST]"
  ],
  "model": "Meta-Llama-3.1-8B-Instruct",
  "stream": true,
  "stream_options": {
    "include_usage": true
  }
}'

If a request fails, the response body provides a JSON object with details about the error.
For more information on errors, refer to the API Error Codes page.

Authorizations

Authorization
string
header
required

Use the format Authorization: Bearer <API_KEY> to authenticate requests.

Body

application/json
max_tokens
integer

Maximum number of tokens to generate. The total length of input tokens and generated tokens is limited by the model’s context length. Default value is the context length of the model.

messages
object[]

A list of messages comprising the conversation so far.

model
string

The name of the model to query. Refer to the supported models page for a list of available models.

response_format
object

You can set the response_format parameter to json_object in your request to ensure that the model outputs a valid JSON. In case the mode is not able to generate a valid JSON, we will return an error. Usage: response_format = { "type": "json\_object" }

stop
string[]

Up to 4 sequences where the API will stop generating further tokens. Default is null.

stream
boolean
default:
false

If set, partial message deltas will be sent. Default is false.

stream_options
object
temperature
number

Determines the degree of randomness in the response. The temperature value can be between 0 and 1.

tool_choice

Controls which (if any) tool is called by the models. When sending a request to SN Cloud, include the function definition in the tools parameter and set tool_choice tools:

  • auto: allows the model to choose between generating a message or calling a function. This is the default tool choice when the field is not specified.
  • required: This forces the model to generate a function call. The model will then always select one or more function(s) to call. To enforce a specific function call, set tool_choice = {“type”: “function”, “function”: {“name”: “solve_quadratic”}}. In this case, the model will only use the specified function.
Available options:
auto,
required
tools
object[]

A list of tools the model may call. Currently, only functions are supported as a tool.

top_k
integer

The top_k parameter is used to limit the number of choices for the next predicted word or token. The value can be between 1 to 100.

top_p
number

The top_p (nucleus) parameter is used to dynamically adjust the number of choices for each predicted token based on the cumulative probabilities. The value can be between 0 and 1.

Response

200 - application/json
choices
object[]

A list containing a single chat completion.

created
integer

The Unix timestamp (in seconds) of when the chat completion was created. Each chunk has the same timestamp.

id
string

A unique identifier for the chat completion.

model
string

The model used to generate the completion.

object
string

The object type, which is always chat.completion.

usage
object

An optional field present when stream_options: {"include_usage": true} is set. When present, it contains a null value except for the last chunk, which contains the token usage statistics for the entire request.