SambaNova’s first speech model on SambaCloud will extend our multimodal AI capabilities beyond vision to include advanced audio processing. This model offers OpenAI compatible endpoints that enable real-time transcriptions and translations.

The Whisper-Large-v3 model

  • Model: Whisper-Large-v3
  • Description: State-of-the-art automatic speech recognition (ASR) and translation model. Developed by OpenAI and trained on 5M+ hours of labeled audio. Excels in multilingual and zero-shot speech tasks across diverse domains.
  • Model ID: Whisper-Large-v3
  • Supported languages: Multilingual

Core capabilities

  • Transcribes and translates extended audio inputs (up to 25 MB).
  • Demonstrates high accuracy in speech recognition and translation tasks.
  • Provides OpenAI-compatible endpoints for transcriptions and translations.

Request parameters

ParameterTypeDescriptionDefaultEndpoints
modelStringThe ID of the model to use.Requiredtranscriptions, translations
fileFileAudio file in FLAC, MP3, MP4, MPEG, MPGA, M4A, Ogg, WAV, or WebM format. File size limit: 25MB.Requiredtranscriptions, translations
promptStringPrompt to influence transcription style or vocabulary. Example: “Please transcribe carefully, including pauses and hesitations.”Optionaltranscriptions, translations
response_formatStringOutput format: either json or text.jsontranscriptions, translations
languageStringThe language of the input audio. Using ISO-639-1 format (e.g., en) improves accuracy and latency.Optionaltranscriptions, translations
streamBooleanEnables streaming responses.falsetranscriptions, translations
stream_optionsObjectAdditional streaming configuration (e.g., {"include_usage": true}).Optionaltranscriptions, translations