Vision
Using vision models on SambaNova Cloud, users can process multimodal inputs consisting of text and images. These models understand and analyze images to then generate text based on the context. Learn how to query SambaNova Cloud’s vision models using OpenAI’s Python client.
Make a query with an image
With the Cloud API’s OpenAI Compatibility, the vision model request follows OpenAI’s multimodal input format, accepting both text and image inputs in a structured payload. While the call is similar to Text Generation, it differs by including an encoded image file, referenced via the image_path
variable. A helper function is used to convert this image into a base64 string, allowing it to be passed alongside the text in the request.
Step 1
Make a new Python file and copy the code below.;
This example uses the Llama-4-Maverick-17B-128E-Instruct model.
Step 2
Use your SambaNova API key to replace "SAMBANOVA_API_KEY"
in the construction of the client.
Step 3
Select an image and move it to a suitable path that you can specify in the lines.
Step 4
Verify the prompt to pair with the image in the content
portion of the user
prompt.
Step 5
Run the Python file to receive the text output.