Endpoints
Transcription
Converts audio to text in the specified language.
Endpoint
Request parameters
The following tables outline the parameters required to make a transcription request, parameter type, description, and default values.
Whisper Large v3
Parameter | Type | Description | Default |
---|---|---|---|
model | String | The ID of the model to use. | Required |
prompt | String | Prompt provided to influence transcription style or vocabulary. Example: Please transcribe carefully, including pauses and hesitations. “ | Optional |
temperature | Number | Sampling temperature between 0 and 1. Higher values (e.g., 0.8) increase randomness, while lower values (e.g., 0.2) make output more focused. | 0 |
file | File | Audio file in FLAC, MP3, MP4, MPEG, MPGA, M4A, Ogg, WAV, or WebM format. File size limit is 25MB. | Required |
response_format | String | Output format: JSON or text. | json |
language | String | The language of the input audio. Supplying the input language in ISO-639-1 (e.g. en) format will improve accuracy and latency. | Required |
Qwen2-Audio-7B-Instruct
Parameter | Type | Description | Default |
---|---|---|---|
model | String | The ID of the model to use. | Required |
response_format | String | The output format is either json or text. | json |
temperature | Number | Sampling temperature between 0 and 1. Higher values (e.g., 0.8) increase randomness, while lower values (e.g., 0.2) make output more focused. | 0 |
max_tokens | Number | The maximum number of tokens to generate. | 1000 |
file | File | Audio file in FLAC, MP3, MP4, MPEG, MPGA, M4A, Ogg, WAV, or WebM format. Each single file must not exceed 30 seconds in duration. | Required |
language | String | The target language for transcription or translation. | Optional |
stream | Boolean | Enables streaming responses. | false |
stream_options | Object | Additional streaming configuration (e.g., {“include_usage”: true}). | Optional |
Request format
CURL
This section provides examples of how to send a request using different methods.
Python
Response format
The API returns a translation of the input audio in the selected format.