Vision
Using vision models on SambaNova Cloud, users can process multimodal inputs consisting of text and images. These models understand and analyze images to then generate text based on the context. Learn how to query SambaNova Cloud’s vision models using OpenAI’s Python client.
Make a query with an image
With the Cloud API’s OpenAI Compatibility, the vision model request follows OpenAI’s multimodal input format. The model receives both text and an image in a structured request. Making a call with a vision model is similar to the call for Text Generation, but differs by passing in the encoded the image through its file path, represented by the variable image_path
. A helper function converts the path into base64 string by so that it can be passed to the model along with text.
Step 1
Make a new Python file and copy the code below.
Step 2
Use your SambaNova API key to replace <"YOUR API KEY">
in the construction of the client.
Step 3
Select an image and move it to a suitable path that you can specify in the lines.
Step 4
Verify the prompt to pair with the image in the content
portion of the user
prompt.
Step 5
Run the Python file to receive the text output.
In this example, the "sample.JPEG"
had contained an image of hiking, and your response will reflect your selected image.