Transcription

Converts audio to text in the specified language.

Endpoint

POST https://api.sambanova.ai/v1/audio/transcriptions

Request parameters

The following tables outline the parameters required to make a transcription request, parameter type, description, and default values.

For improved accuracy, we strongly recommend specifying the language parameter when using any audio model.

Whisper Large v3

Parameter	Type	Description	Default
`model`	String	The ID of the model to use.	Required
`file`	File	Audio file in FLAC, MP3, MP4, MPEG, MPGA, M4A, Ogg, WAV, or WebM format. File size limit is 25MB.	Required
`prompt`	String	Prompt provided to influence transcription style or vocabulary. Example: “Please transcribe carefully, including pauses and hesitations.”	Optional
`response_format`	String	Output format: JSON or text.	`json`
`language`	String	The language of the input audio. Supplying the input language in ISO-639-1 (e.g. en) format will improve accuracy and latency.	Optional
`stream`	Boolean	Enables streaming responses.	false
`stream_options`	Object	Additional streaming configuration (e.g., {“include_usage”: true}).	Optional

Request format

CURL

This section provides examples of how to send a request using different methods.

curl --location 'https://api.sambanova.ai/v1/audio/transcriptions' \
--header 'Authorization: Bearer YOUR_API_KEY' \
--form 'model="Whisper-Large-v3"' \
--form 'language="spanish"' \
--form 'response_format="json"' \
--form 'file=@"/path/to/audio/file.mp3"' \
--form 'stream="true"'

Python

import requests

def transcribe_audio(audio_file_path, api_key, language="english"):
  headers = {"Authorization": f"Bearer {api_key}"}

  files = {"file": open(audio_file_path, "rb")}

  data = {
      "model": "Whisper-Large-v3",
      "language": language,
      "response_format": "json",
      "stream": true,  # Optional
  }

  response = requests.post(
      "https://api.sambanova.ai/v1/audio/transcriptions",
      headers=headers,
      files=files,
      data=data,
  )

  return response.json()

Response format

The API returns a translation of the input audio in the selected format.

JSON

{
    "text": "It's a sound effect of a bell chiming, specifically a church bell."
}

Text

It's a sound effect of a bell chiming, specifically a church bell.

Endpoints

Using the API

Endpoint

Request parameters

Whisper Large v3

Request format

CURL

Python

Response format

JSON

Text

Endpoints

Using the API

​Endpoint

​Request parameters

​Whisper Large v3

​Request format

​CURL

​Python

​Response format

​JSON

​Text

Endpoint

Request parameters

Whisper Large v3

Request format

CURL

Python

Response format

JSON

Text