Skip to main content
For developers requiring audio support, SambaNova provides OpenAI’s Whisper large-v3 model, which enables real-time transcriptions and translations.

Whisper-Large-v3

  • Model: Whisper-Large-v3
  • Description: State-of-the-art automatic speech recognition (ASR) and translation model. Developed by OpenAI and trained on 5M+ hours of labeled audio. Excels in multilingual and zero-shot speech tasks across diverse domains.
  • Model ID: Whisper-Large-v3
  • Supported languages: Multilingual

Core capabilities

  • Transcribes and translates extended audio inputs (up to 25 MB).
  • Demonstrates high accuracy in speech recognition and translation tasks.
  • Provides OpenAI-compatible endpoints for transcriptions and translations.

Request parameters

ParameterTypeDescriptionDefaultEndpoints
modelStringThe ID of the model to use.Requiredtranscriptions, translations
fileFileAudio file in FLAC, MP3, MP4, MPEG, MPGA, M4A, Ogg, WAV, or WebM format. File size limit: 25MB.Requiredtranscriptions, translations
promptStringPrompt to influence transcription style or vocabulary. Example: “Please transcribe carefully, including pauses and hesitations.”Optionaltranscriptions, translations
response_formatStringOutput format: either json or text.jsontranscriptions, translations
languageStringThe language of the input audio. Using ISO-639-1 format (e.g., en) improves accuracy and latency.Optionaltranscriptions, translations
streamBooleanEnables streaming responses.falsetranscriptions, translations
stream_optionsObjectAdditional streaming configuration (e.g., {"include_usage": true}).Optionaltranscriptions, translations

Example usage

from sambanova import SambaNova
import base64

client = SambaNova(
    base_url="your-sambanova-base-url",
    api_key="your-sambanova-api-key",
)

audio_path="audio_path"
with open(audio_path, "rb") as audio_file:
   bin_audio = audio_file.read()

response = client.audio.transcriptions.create(
    model="Whisper-Large-v3",
    file=(audio_path,bin_audio),
)
print(str(response))

Translations

The translations endpoint transcribes audio in any supported language and returns the output in English. Use the language parameter to specify the language of the input audio in ISO 639-1 format (for example, "es" for Spanish) to improve accuracy and reduce latency.

Example usage

from sambanova import SambaNova

client = SambaNova(
    base_url="your-sambanova-base-url",
    api_key="your-sambanova-api-key",
)

audio_path = "audio_path"
with open(audio_path, "rb") as audio_file:
    bin_audio = audio_file.read()

response = client.audio.translations.create(
    model="Whisper-Large-v3",
    file=(audio_path, bin_audio),
    language="es",
)
print(str(response))

Example response

{
  "text": "It is the sound effect of a bell ringing, specifically a church bell."
}