Chat Completion
Creates a model response for the given chat conversation.
If a request fails, the response body provides a JSON object with details about the error.
For more information on errors, refer to the API Error Codes page.
Authorizations
Use the format Authorization: Bearer <API_KEY>
to authenticate requests.
Body
Maximum number of tokens to generate. The total length of input tokens and generated tokens is limited by the model’s context length. Default value is the context length of the model.
A list of messages comprising the conversation so far.
The name of the model to query. Refer to the supported models page for a list of available models.
You can set the response_format parameter to json_object
in your request to ensure that the model outputs a valid JSON. In case the mode is not able to generate a valid JSON, we will return an error.
Usage: response_format = { "type": "json\_object" }
Up to 4 sequences where the API will stop generating further tokens. Default is null
.
If set, partial message deltas will be sent. Default is false
.
Determines the degree of randomness in the response. The temperature value can be between 0 and 1.
Controls which (if any) tool is called by the models. When sending a request to SN Cloud, include the function definition in the tools parameter and set tool_choice tools:
auto
: allows the model to choose between generating a message or calling a function. This is the default tool choice when the field is not specified.required
: This forces the model to generate a function call. The model will then always select one or more function(s) to call. To enforce a specific function call, set tool_choice = {“type”: “function”, “function”: {“name”: “solve_quadratic”}}. In this case, the model will only use the specified function.
auto
, required
A list of tools the model may call. Currently, only functions are supported as a tool.
The top_k
parameter is used to limit the number of choices for the next predicted word or token. The value can be between 1 to 100.
The top_p
(nucleus) parameter is used to dynamically adjust the number of choices for each predicted token based on the cumulative probabilities. The value can be between 0 and 1.
Response
A list containing a single chat completion.
The Unix timestamp (in seconds) of when the chat completion was created. Each chunk has the same timestamp.
A unique identifier for the chat completion.
The model used to generate the completion.
The object type, which is always chat.completion
.
An optional field present when stream_options: {"include_usage": true}
is set.
When present, it contains a null
value except for the last chunk, which contains the token usage statistics for the entire request.