The SambaNova Cloud Vision API enables models to process image inputs alongside text. View our Vision Capability guide for an introduction.

Please see the Vision capabilities document for additional information.

Endpoint

Creates a model response for the given an input that can include both text and image data.

POST https://api.sambanova.ai/v1/chat/completions

Request body

ParameterTypeDescriptionRequired
modelstringThe ID of the selected model to query. For vision tasks, use models like Llama-3.2-11B-Vision-Instruct.Yes
messagesarray of objectsA list of messages forming the conversation. Each message can include both text and image inputs. See the Image Input Format below for details.Yes
max_tokensintegerMaximum number of tokens to generate. The total length of input and generated tokens is limited by the model’s context length. Default is 1000.No
temperaturefloatControls randomness in responses. Value can be between 0 and 1. Default is 0.No
top_pfloatAdjusts the number of choices for each predicted token based on cumulative probabilities. Value can be between 0 and 1. Default is 0.9.No
top_kintegerLimits the number of choices for the next predicted word or token. Value can be between 1 and 100. Default is 50.No
stopstring or arrayUp to 4 sequences where the API will stop generating further tokens. Default is null.No
streambooleanIf true, partial message deltas will be sent. Default is false.No
stream_optionsobjectOptions for streaming response. Only set this when stream: true. Available option: include_usage (boolean). Default is null.No

Messages format for image input

  • Single image per request

    • Each request supports only one image input. For multiple images, send separate requests.
  • Encoding requirements

    • Ensure the image is base64-encoded and within size limits. Invalid encoding will result in errors. View more information on our API Error page.
ParameterTypeDescriptionRequired
typestringIndicates the type of content. For images, set this to image_url.Yes
image_url.urlstringThe base64-encoded image string. Must follow the format: data:<image_format>;base64,<data>.Yes

Example request

{
  "model": "Llama-3.2-11B-Vision-Instruct",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What is happening in this image?"
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "data:image/jpeg;base64,<base64_encoded_image>"
          }
        }
      ]
    }
  ],
  "max_tokens": 300,
  "temperature": 0.7,
  "top_p": 0.9,
  "top_k": 50
}

Response

The API returns a chat completion object containing the model’s response to the provided input.

In this sample, the image entered was a nature scene, and your response will reflect your selected image.

Sample response

{
  "id": "chatcmpl-456",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "Llama-3.2-11B-Vision-Instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "This image shows a sunset over a mountain range with a lake in the foreground. The scene is serene and filled with vibrant colors."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 50,
    "completion_tokens": 32,
    "total_tokens": 82
  }
}