SambaNova Cloud enforces rate limits on inference requests per model to ensure that developers are able to try the fastest inference.
Rate limits for the developer tier are described below.
Payment and credits limits are applied when a payment method is linked with the account. Credits only limits are applied when there is no additional payment method linked with the account. See more on the Billing page.
Preview models in SambaNova Cloud are available as early-access offerings intended primarily for evaluation purposes. During the preview phase, these models have limited capacity but are fully functional in terms of accuracy and performance.
Developer | Model ID | Requests per minute (RPM) | Requests per hour (RPH) | Requests per day (RPD) |
---|---|---|---|---|
DeepSeek | ||||
DeepSeek-V3-0324 | 10 | 50 | 600 | |
OpenAI | ||||
Whisper-Large-v3 | 300 | 1500 | 18000 | |
Meta | ||||
Llama-4-Scout-17B-16E-Instruct | 40 | 200 | 2400 | |
Llama-4-Maverick-17B-128E-Instruct | 40 | 200 | 2400 | |
Qwen | ||||
Qwen2-Audio-7B-Instruct | 10 | 50 | 600 | |
Qwen3-32B | 20 | 100 | 1,200 |
Developer | Model ID | Requests per minute (RPM) | Requests per hour (RPH) | Requests per day (RPD) |
---|---|---|---|---|
DeepSeek | ||||
DeepSeek-V3-0324 | 10 | 50 | 600 | |
OpenAI | ||||
Whisper-Large-v3 | 300 | 1500 | 18000 | |
Meta | ||||
Llama-4-Scout-17B-16E-Instruct | 40 | 200 | 2400 | |
Llama-4-Maverick-17B-128E-Instruct | 40 | 200 | 2400 | |
Qwen | ||||
Qwen2-Audio-7B-Instruct | 10 | 50 | 600 | |
Qwen3-32B | 20 | 100 | 1,200 |
Developer | Model ID | Requests per minute (RPM) | Requests per hour (RPH) | Requests per day (RPD) |
---|---|---|---|---|
DeepSeek | ||||
DeepSeek-V3-0324 | 5 | 10 | 100 | |
OpenAI | ||||
Whisper-Large-v3 | 150 | 300 | 3000 | |
Meta | ||||
Llama-4-Scout-17B-16E-Instruct | 20 | 40 | 400 | |
Llama-4-Maverick-17B-128E-Instruct | 20 | 40 | 400 | |
Qwen | ||||
Qwen2-Audio-7B-Instruct | 5 | 10 | 100 | |
Qwen3-32B | 10 | 20 | 200 |
Production models meet our high standards for speed and quality and are intended for use in production environments.
Developer | Model ID | Requests per minute (RPM) | Requests per hour (RPH) | Requests per day (RPD) |
---|---|---|---|---|
DeepSeek | ||||
DeepSeek-R1 | 20 | 100 | 1200 | |
DeepSeek-R1-Distill-Llama-70B | 80 | 400 | 4800 | |
DeepSeek-V3-0324 | 10 | 50 | 600 | |
Meta | ||||
Meta-Llama-3.3-70B-Instruct | 80 | 400 | 4800 | |
Meta-Llama-3.2-3B-Instruct | 120 | 600 | 7200 | |
Meta-Llama-3.2-1B-Instruct | 120 | 600 | 7200 | |
Meta-Llama-3.1-405B-Instruct | 30 | 150 | 1800 | |
Meta-Llama-3.1-8B-Instruct | 480 | 2400 | 28800 | |
Meta-Llama-Guard-3-8B | 60 | 300 | 3600 | |
Qwen | ||||
QwQ-32B | 20 | 100 | 1200 | |
Tokyotech-llm | ||||
Llama-3.3-Swallow-70B-Instruct-v0.4 | 60 | 300 | 3600 | |
Other | ||||
E5-Mistral-7B-Instruct | 30 | 150 | 1800 |
Developer | Model ID | Requests per minute (RPM) | Requests per hour (RPH) | Requests per day (RPD) |
---|---|---|---|---|
DeepSeek | ||||
DeepSeek-R1 | 20 | 100 | 1200 | |
DeepSeek-R1-Distill-Llama-70B | 80 | 400 | 4800 | |
DeepSeek-V3-0324 | 10 | 50 | 600 | |
Meta | ||||
Meta-Llama-3.3-70B-Instruct | 80 | 400 | 4800 | |
Meta-Llama-3.2-3B-Instruct | 120 | 600 | 7200 | |
Meta-Llama-3.2-1B-Instruct | 120 | 600 | 7200 | |
Meta-Llama-3.1-405B-Instruct | 30 | 150 | 1800 | |
Meta-Llama-3.1-8B-Instruct | 480 | 2400 | 28800 | |
Meta-Llama-Guard-3-8B | 60 | 300 | 3600 | |
Qwen | ||||
QwQ-32B | 20 | 100 | 1200 | |
Tokyotech-llm | ||||
Llama-3.3-Swallow-70B-Instruct-v0.4 | 60 | 300 | 3600 | |
Other | ||||
E5-Mistral-7B-Instruct | 30 | 150 | 1800 |
Developer | Model ID | Requests per minute (RPM) | Requests per hour (RPH) | Requests per day (RPD) |
---|---|---|---|---|
DeepSeek | ||||
DeepSeek-R1 | 10 | 20 | 200 | |
DeepSeek-R1-Distill-Llama-70B | 40 | 80 | 800 | |
DeepSeek-V3-0324 | 5 | 10 | 100 | |
Meta | ||||
Meta-Llama-3.3-70B-Instruct | 40 | 80 | 800 | |
Meta-Llama-3.2-3B-Instruct | 60 | 120 | 1200 | |
Meta-Llama-3.2-1B-Instruct | 60 | 120 | 1200 | |
Meta-Llama-3.1-405B-Instruct | 15 | 30 | 300 | |
Meta-Llama-3.1-8B-Instruct | 240 | 480 | 4800 | |
Meta-Llama-Guard-3-8B | 30 | 60 | 600 | |
Qwen | ||||
QwQ-32B | 10 | 20 | 200 | |
Tokyotech-llm | ||||
Llama-3.3-Swallow-70B-Instruct-v0.4 | 30 | 60 | 600 | |
Other | ||||
E5-Mistral-7B-Instruct | 15 | 30 | 300 |
For rate limits in the Managed Subscription and Dedicated tiers, reach out to sales or contact us on our Community page so we can accommodate your projects needs.
These headers found in the response give information about the current status of rate limit usage. The default rate limit header is in RPM.
RPM (Requests per minute):
x-ratelimit-limit-requests
x-ratelimit-remaining-requests
x-ratelimit-reset-requests
RPH (Requests per hour):
x-ratelimit-limit-requests-hour
x-ratelimit-remaining-requests-hour
x-ratelimit-reset-requests-hour
RPD (Requests per day):
x-ratelimit-limit-requests-day
x-ratelimit-remaining-requests-day
x-ratelimit-reset-requests-day
SambaNova Cloud enforces rate limits on inference requests per model to ensure that developers are able to try the fastest inference.
Rate limits for the developer tier are described below.
Payment and credits limits are applied when a payment method is linked with the account. Credits only limits are applied when there is no additional payment method linked with the account. See more on the Billing page.
Preview models in SambaNova Cloud are available as early-access offerings intended primarily for evaluation purposes. During the preview phase, these models have limited capacity but are fully functional in terms of accuracy and performance.
Developer | Model ID | Requests per minute (RPM) | Requests per hour (RPH) | Requests per day (RPD) |
---|---|---|---|---|
DeepSeek | ||||
DeepSeek-V3-0324 | 10 | 50 | 600 | |
OpenAI | ||||
Whisper-Large-v3 | 300 | 1500 | 18000 | |
Meta | ||||
Llama-4-Scout-17B-16E-Instruct | 40 | 200 | 2400 | |
Llama-4-Maverick-17B-128E-Instruct | 40 | 200 | 2400 | |
Qwen | ||||
Qwen2-Audio-7B-Instruct | 10 | 50 | 600 | |
Qwen3-32B | 20 | 100 | 1,200 |
Developer | Model ID | Requests per minute (RPM) | Requests per hour (RPH) | Requests per day (RPD) |
---|---|---|---|---|
DeepSeek | ||||
DeepSeek-V3-0324 | 10 | 50 | 600 | |
OpenAI | ||||
Whisper-Large-v3 | 300 | 1500 | 18000 | |
Meta | ||||
Llama-4-Scout-17B-16E-Instruct | 40 | 200 | 2400 | |
Llama-4-Maverick-17B-128E-Instruct | 40 | 200 | 2400 | |
Qwen | ||||
Qwen2-Audio-7B-Instruct | 10 | 50 | 600 | |
Qwen3-32B | 20 | 100 | 1,200 |
Developer | Model ID | Requests per minute (RPM) | Requests per hour (RPH) | Requests per day (RPD) |
---|---|---|---|---|
DeepSeek | ||||
DeepSeek-V3-0324 | 5 | 10 | 100 | |
OpenAI | ||||
Whisper-Large-v3 | 150 | 300 | 3000 | |
Meta | ||||
Llama-4-Scout-17B-16E-Instruct | 20 | 40 | 400 | |
Llama-4-Maverick-17B-128E-Instruct | 20 | 40 | 400 | |
Qwen | ||||
Qwen2-Audio-7B-Instruct | 5 | 10 | 100 | |
Qwen3-32B | 10 | 20 | 200 |
Production models meet our high standards for speed and quality and are intended for use in production environments.
Developer | Model ID | Requests per minute (RPM) | Requests per hour (RPH) | Requests per day (RPD) |
---|---|---|---|---|
DeepSeek | ||||
DeepSeek-R1 | 20 | 100 | 1200 | |
DeepSeek-R1-Distill-Llama-70B | 80 | 400 | 4800 | |
DeepSeek-V3-0324 | 10 | 50 | 600 | |
Meta | ||||
Meta-Llama-3.3-70B-Instruct | 80 | 400 | 4800 | |
Meta-Llama-3.2-3B-Instruct | 120 | 600 | 7200 | |
Meta-Llama-3.2-1B-Instruct | 120 | 600 | 7200 | |
Meta-Llama-3.1-405B-Instruct | 30 | 150 | 1800 | |
Meta-Llama-3.1-8B-Instruct | 480 | 2400 | 28800 | |
Meta-Llama-Guard-3-8B | 60 | 300 | 3600 | |
Qwen | ||||
QwQ-32B | 20 | 100 | 1200 | |
Tokyotech-llm | ||||
Llama-3.3-Swallow-70B-Instruct-v0.4 | 60 | 300 | 3600 | |
Other | ||||
E5-Mistral-7B-Instruct | 30 | 150 | 1800 |
Developer | Model ID | Requests per minute (RPM) | Requests per hour (RPH) | Requests per day (RPD) |
---|---|---|---|---|
DeepSeek | ||||
DeepSeek-R1 | 20 | 100 | 1200 | |
DeepSeek-R1-Distill-Llama-70B | 80 | 400 | 4800 | |
DeepSeek-V3-0324 | 10 | 50 | 600 | |
Meta | ||||
Meta-Llama-3.3-70B-Instruct | 80 | 400 | 4800 | |
Meta-Llama-3.2-3B-Instruct | 120 | 600 | 7200 | |
Meta-Llama-3.2-1B-Instruct | 120 | 600 | 7200 | |
Meta-Llama-3.1-405B-Instruct | 30 | 150 | 1800 | |
Meta-Llama-3.1-8B-Instruct | 480 | 2400 | 28800 | |
Meta-Llama-Guard-3-8B | 60 | 300 | 3600 | |
Qwen | ||||
QwQ-32B | 20 | 100 | 1200 | |
Tokyotech-llm | ||||
Llama-3.3-Swallow-70B-Instruct-v0.4 | 60 | 300 | 3600 | |
Other | ||||
E5-Mistral-7B-Instruct | 30 | 150 | 1800 |
Developer | Model ID | Requests per minute (RPM) | Requests per hour (RPH) | Requests per day (RPD) |
---|---|---|---|---|
DeepSeek | ||||
DeepSeek-R1 | 10 | 20 | 200 | |
DeepSeek-R1-Distill-Llama-70B | 40 | 80 | 800 | |
DeepSeek-V3-0324 | 5 | 10 | 100 | |
Meta | ||||
Meta-Llama-3.3-70B-Instruct | 40 | 80 | 800 | |
Meta-Llama-3.2-3B-Instruct | 60 | 120 | 1200 | |
Meta-Llama-3.2-1B-Instruct | 60 | 120 | 1200 | |
Meta-Llama-3.1-405B-Instruct | 15 | 30 | 300 | |
Meta-Llama-3.1-8B-Instruct | 240 | 480 | 4800 | |
Meta-Llama-Guard-3-8B | 30 | 60 | 600 | |
Qwen | ||||
QwQ-32B | 10 | 20 | 200 | |
Tokyotech-llm | ||||
Llama-3.3-Swallow-70B-Instruct-v0.4 | 30 | 60 | 600 | |
Other | ||||
E5-Mistral-7B-Instruct | 15 | 30 | 300 |
For rate limits in the Managed Subscription and Dedicated tiers, reach out to sales or contact us on our Community page so we can accommodate your projects needs.
These headers found in the response give information about the current status of rate limit usage. The default rate limit header is in RPM.
RPM (Requests per minute):
x-ratelimit-limit-requests
x-ratelimit-remaining-requests
x-ratelimit-reset-requests
RPH (Requests per hour):
x-ratelimit-limit-requests-hour
x-ratelimit-remaining-requests-hour
x-ratelimit-reset-requests-hour
RPD (Requests per day):
x-ratelimit-limit-requests-day
x-ratelimit-remaining-requests-day
x-ratelimit-reset-requests-day