Endpoint path

The path to the endpoint determines the action to be completed from the server. For example, https://api.sambanova.ai/v1/chat/completions refers to the chat endpoint and is outlined further in our API Reference.

API Key

An API key is used to authenticate your requests and is a unique jumble letters and numbers to keep secret. For example, in code portions like <YOUR API KEY>, you substitute your actual key value in the brackets.

Model name

An API request needs the complete model name to properly connect to. A complete name, like Meta-Llama-3.1-405b-Instruct, may be shortened to Llama 3.1 405B in casual reference. When making a request, use the complete names available on the Supported models page.

Prompt input

Prompts are distinguished by two different roles: ‘system’ and ‘user’. System prompts configure the behavior of the model when returning your request. User prompts dynamically outline the instructions for a model to complete a task or query.

Stop sequence

List of sequences where the API will stop generating further tokens. This can be a string, or an array of strings.

Model Parameters

max_tokens

The maximum number of tokens to generate.

temperature

Determines the degree of randomness in the response.

top_p

The top_p (nucleus) parameter is used to dynamically adjust the number of choices for each predicted token based on the cumulative probabilities.

top_k

The top_k(type: number) parameter is used to limit the number of choices for the next predicted word or token. Default value is MAX_INT.

Stream

This decides whether the response is a stream or a single response.

Stream options

If set, an additional chunk will be streamed before the data: [DONE] message. The usage field on this chunk shows the token usage statistics for the entire request, and the choices field will always be an empty array. All other chunks will also include a usage field with a null value. The usage metrics include OpenAI metrics and some additional metrics supported by SambaNova Cloud.