Using vision models on SambaNova Cloud, users can process multimodal inputs consisting of text and images. These models understand and analyze images to then generate text based on the context. Learn how to query SambaNova Cloud’s vision models using OpenAI’s Python client.

Make a query with an image

With the Cloud API’s OpenAI Compatibility, the vision model request follows OpenAI’s multimodal input format, accepting both text and image inputs in a structured payload. While the call is similar to Text Generation, it differs by including an encoded image file, referenced via the image_path variable. A helper function is used to convert this image into a base64 string, allowing it to be passed alongside the text in the request.

1

Step 1

Make a new Python file and copy the code below.;

This example uses the Llama-4-Maverick-17B-128E-Instruct model.

import openai
import base64

client = openai.OpenAI(
    base_url="https://api.sambanova.ai/v1",
    api_key="SAMBANOVA_API_KEY",
)

# Helper function to encode the image
def encode_image(image_path):
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode('utf-8')

# The path to your image
image_path = "sample.JPEG"

# The base64 string of the image
image_base64 = encode_image(image_path)

print(image_base64)

response = client.chat.completions.create(
    model="Llama-4-Maverick-17B-128E-Instruct",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What is happening in this image?"},
                {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{image_base64}"}}
            ]
        }
    ]
)

print(response.choices[0].message.content)
2

Step 2

Use your SambaNova API key to replace "SAMBANOVA_API_KEY"in the construction of the client.

3

Step 3

Select an image and move it to a suitable path that you can specify in the lines.

# The path to your image
image_path = "sample.JPEG"
4

Step 4

Verify the prompt to pair with the image in the content portion of the user prompt.

5

Step 5

Run the Python file to receive the text output.