For developers requiring audio support, SambaNova provides OpenAI’s Whisper large-v3 model, which enables real-time transcriptions and translations.
This feature is available for SambaStack users only. Whisper-Large-v3 is not available on SambaCloud.
Whisper-Large-v3
Model : Whisper-Large-v3
Description : State-of-the-art automatic speech recognition (ASR) and translation model. Developed by OpenAI and trained on 5M+ hours of labeled audio. Excels in multilingual and zero-shot speech tasks across diverse domains.
Model ID : Whisper-Large-v3
Supported languages : Multilingual
Core capabilities
Transcribes and translates extended audio inputs (up to 25 MB).
Demonstrates high accuracy in speech recognition and translation tasks.
Provides OpenAI-compatible endpoints for transcriptions and translations.
Request parameters
Parameter Type Description Default Endpoints modelString The ID of the model to use. Required transcriptions, translationsfileFile Audio file in FLAC, MP3, MP4, MPEG, MPGA, M4A, Ogg, WAV, or WebM format. File size limit: 25MB. Required transcriptions, translationspromptString Prompt to influence transcription style or vocabulary. Example: “Please transcribe carefully, including pauses and hesitations.” Optional transcriptions, translationsresponse_formatString Output format: either json or text. jsontranscriptions, translationslanguageString The language of the input audio. Using ISO-639-1 format (e.g., en) improves accuracy and latency. Optional transcriptions, translationsstreamBoolean Enables streaming responses. falsetranscriptions, translationsstream_optionsObject Additional streaming configuration (e.g., {"include_usage": true}). Optional transcriptions, translations
Example usage
Python (SambaNova)
Python (OpenAI)
Javascript (SambaNova)
from sambanova import SambaNova
import base64
client = SambaNova(
base_url = "your-sambanova-base-url" ,
api_key = "your-sambanova-api-key" ,
)
audio_path = "audio_path"
with open (audio_path, "rb" ) as audio_file:
bin_audio = audio_file.read()
response = client.audio.transcriptions.create(
model = "Whisper-Large-v3" ,
file = (audio_path,bin_audio),
)
print ( str (response))
Translations
The translations endpoint transcribes audio in any supported language and returns the output in English. Use the language parameter to specify the language of the input audio in ISO 639-1 format (for example, "es" for Spanish) to improve accuracy and reduce latency.
Example usage
Python (SambaNova)
Python (OpenAI)
Javascript (SambaNova)
from sambanova import SambaNova
client = SambaNova(
base_url = "your-sambanova-base-url" ,
api_key = "your-sambanova-api-key" ,
)
audio_path = "audio_path"
with open (audio_path, "rb" ) as audio_file:
bin_audio = audio_file.read()
response = client.audio.translations.create(
model = "Whisper-Large-v3" ,
file = (audio_path, bin_audio),
language = "es" ,
)
print ( str (response))
Example response
{
"text" : "It is the sound effect of a bell ringing, specifically a church bell."
}