Converts audio to text in the specified language.

Endpoint

POST https://api.sambanova.ai/v1/audio/transcriptions

Request parameters

The following tables outline the parameters required to make a transcription request, parameter type, description, and default values.

Whisper Large v3

ParameterTypeDescriptionDefault
modelStringThe ID of the model to use.Required
promptStringPrompt provided to influence transcription style or vocabulary. Example: Please transcribe carefully, including pauses and hesitations. “Optional
temperatureNumberSampling temperature between 0 and 1. Higher values (e.g., 0.8) increase randomness, while lower values (e.g., 0.2) make output more focused.0
fileFileAudio file in FLAC, MP3, MP4, MPEG, MPGA, M4A, Ogg, WAV, or WebM format. File size limit is 25MB.Required
response_formatStringOutput format: JSON or text.json
languageStringThe language of the input audio. Supplying the input language in ISO-639-1 (e.g. en) format will improve accuracy and latency.Required

Qwen2-Audio-7B-Instruct

ParameterTypeDescriptionDefault
modelStringThe ID of the model to use.Required
response_formatStringThe output format is either json or text.json
temperatureNumberSampling temperature between 0 and 1. Higher values (e.g., 0.8) increase randomness, while lower values (e.g., 0.2) make output more focused.0
max_tokensNumberThe maximum number of tokens to generate.1000
fileFileAudio file in FLAC, MP3, MP4, MPEG, MPGA, M4A, Ogg, WAV, or WebM format. Each single file must not exceed 30 seconds in duration.Required
languageStringThe target language for transcription or translation.Optional
streamBooleanEnables streaming responses.false
stream_optionsObjectAdditional streaming configuration (e.g., {“include_usage”: true}).Optional

Request format

CURL

This section provides examples of how to send a request using different methods.

curl --location 'https://api.sambanova.ai/v1/audio/transcriptions' \
--header 'Authorization: Bearer YOUR_API_KEY' \
--form 'model="Qwen2-Audio-7B-Instruct"' \
--form 'language="spanish"' \
--form 'response_format="json"' \
--form 'temperature="0.01"' \
--form 'file=@"/path/to/audio/file.mp3"' \
--form 'stream="true"'

Python

import requests

def transcribe_audio(audio_file_path, api_key, language="english"):
  headers = {"Authorization": f"Bearer {api_key}"}

  files = {"file": open(audio_file_path, "rb")}

  data = {
      "model": "Qwen2-Audio-7B-Instruct",
      "language": language,
      "response_format": "json",
      "temperature": 0.01,
      "stream": true,  # Optional
  }

  response = requests.post(
      "https://api.sambanova.ai/v1/audio/transcriptions",
      headers=headers,
      files=files,
      data=data,
  )

  return response.json()

Response format

The API returns a translation of the input audio in the selected format.

JSON

{
    "text": "It's a sound effect of a bell chiming, specifically a church bell."
}

Text

It's a sound effect of a bell chiming, specifically a church bell.