Chat completion
The Chat completion API generates responses based on a given conversation. It supports both text-based and multimodal inputs.
Please see the Text generation capabilities document for additional usage information.
Endpoint
Request parameters
The following table outlines the parameters required to make a chat completion request, parameter type, and description.
Required parameters
Parameter | Type | Description |
---|---|---|
model | String | The name of the model to query. Refer to the Supported models list. |
messages | Array | The conversation history. Each message has a role and content . See message object structure for more details. |
Message object structure
Each message object within the messages
array consists of role
and content
.
Field | Type | Description |
---|---|---|
role | String | The role of the message author. Choices: system , user , or assistant . |
content | Mixed | The message content. A string for text-only messages, or an array for multimodal content. See examples of string content and multimodal content. |
Example string content
Example multimodal content
Optional parameters
The following table outlines the optional parameters that can be used to fine-tune the model’s behavior. You can see the parameter type, description, and default values.
Parameter | Type | Description | Values |
---|---|---|---|
max_tokens | Integer | Maximum number of tokens to generate. Limited by model context length. | None |
temperature | Float | Controls randomness in response. Higher values increase randomness. | 0 to 1 |
top_p | Float | Adjusts token selection probability, ensuring dynamic response generation. | 0 to 1 |
top_k | Integer | Limits the number of token choices. | 1 to 100 |
stop | String, array, null | Specifies up to four sequences where the API should stop generating responses. This helps control output length. | Default: null |
stream | Boolean, null | Enables streaming responses when set to true . If false , the full response is returned after completion. | Default: false |
stream_options | Object, null | Specifies additional streaming options (only when stream: true ). Available option: include_usage: boolean . | Default: null |
Function calling parameters
Models that support function calling will have the following three parameters available to use. You can find detailed information about these parameters and supported models on the function calling page.
Parameter | Type | Description | Values |
---|---|---|---|
tools | Array | Defines external tools the model can call (currently supports only functions). See tools parameter usage table. | None |
response_format | Object | Ensures output is valid JSON. Use { "type": "json_object" } for structured responses. | None |
tool_choice | String, object | Controls tool usage (auto , required , or specific function). See tool_choice value table. | Default: auto |
Example usage of tools parameter
The following table outlines the structure of the tools
parameter.
Type | Object fields | Description |
---|---|---|
Function | name (string ) | The name of the function to call. |
description (string ) | A short description of what the function does. | |
parameters (object ) | Defines the function parameters. | |
parameters.type (string ) | The data type of the parameters object (always "object" ). | |
parameters.properties (object ) | Defines the function parameters and their properties. | |
parameters.properties.<param_name> (object ) | Each function parameter is defined as an object with: type (data type) and description (description of the parameter). | |
parameters.required (array ) | A list of required parameters for the function. |
Accepted values for tool choice
The following table illustrates how the tool_choice
parameter controls the model’s interaction with external functions.
Value | Description |
---|---|
auto | The model chooses between generating a message or calling a function. This is the default behavior when tool_choice is not specified. |
required | Forces the model to generate a function call. The model will always select one or more functions to call. |
Example requests
Below is a sample request body for a streaming response for a text model.
Example response format
The API returns a chat completion object , or a streamed sequence of chat completion chunk objects, if the request is streamed.
Chat completion response
Represents a chat completion response returned by model, based on the provided input.
Streaming response (chunked)
Represents a streaming response (chunked) returned by model, based on the provided input.
Response fields
The following table provides a list of key properties, parameter type, and description.
If a request fails, the response body provides a JSON object with details about the error. For more information on errors, please see the API error codes page.
Property | Type | Description |
---|---|---|
id | String | A unique identifier for the chat completion. |
choices | Array | A list containing a single chat completion. |
created | Integer | The Unix timestamp (in seconds) of when the chat completion was created. Each chunk has the same timestamp. |
model | String | The model used to generate the completion. |
object | String | The object type, which is always chat.completion . |
usage | Object | An optional field present when stream_options: {"include_usage": true} is set. When present, it contains a null value except for the last chunk, which includes token usage statistics for the entire request. |
throughput_after_first_token | Float | The rate (tokens per second) at which output tokens are generated after the first token has been delivered. |
time_to_first_token | Float | The time (in seconds) the model takes to generate the first token. |
model_execution_time | Float | The time (in seconds) required to generate a complete response or all tokens. |
output_tokens_count | Integer | Number of tokens generated in the response. |
input_tokens_count | Integer | Number of tokens in the input prompt. |
total_tokens_count | Integer | The sum of input and output tokens. |
queue_time | Float | The time (in seconds) a request spends waiting in the queue before being processed by the model. |