SambaNova Cloud enforces rate limits on inference requests per model to ensure that developers are able to try the fastest inference.

Developer tier

Rate limits for the developer tier are described below.

Payment and credits limits are applied when a payment method is linked with the account. Credits only limits are applied when there is no additional payment method linked with the account. See more on the Billing page.

Preview models

Preview models in SambaNova Cloud are available as early-access offerings intended primarily for evaluation purposes. During the preview phase, these models have limited capacity but are fully functional in terms of accuracy and performance.

DeveloperModel IDRequests per minute (RPM)Requests per hour (RPH)Requests per day (RPD)
DeepSeek
DeepSeek-V3-03241050600
OpenAI
Whisper-Large-v3201001200
Meta
Llama-4-Scout-17B-16E-Instruct402002400
Llama-4-Maverick-17B-128E-Instruct402002400
Qwen
Qwen2-Audio-7B-Instruct1050600
Qwen3-32B201001,200

Production models

Production models meet our high standards for speed and quality and are intended for use in production environments.

DeveloperModel IDRequests per minute (RPM)Requests per hour (RPH)Requests per day (RPD)
DeepSeek
DeepSeek-R1201001200
DeepSeek-R1-Distill-Llama-70B703504200
Meta
Meta-Llama-3.3-70B-Instruct804004800
Meta-Llama-3.2-3B-Instruct1206007200
Meta-Llama-3.2-1B-Instruct1206007200
Meta-Llama-3.1-405B-Instruct301501800
Meta-Llama-3.1-8B-Instruct480240028800
Meta-Llama-Guard-3-8B603003600
Qwen
QwQ-32B201001200
Tokyotech-llm
Llama-3.3-Swallow-70B-Instruct-v0.4603003600
Other
E5-Mistral-7B-Instruct301501800

Other tiers

For rate limits in the Managed Subscription and Dedicated tiers, reach out to sales or contact us on our Community page so we can accommodate your projects needs.

Rate limit response headers

These headers found in the response give information about the current status of rate limit usage. The default rate limit header is in RPM.

RPM (Requests per minute):

  • x-ratelimit-limit-requests
    • The maximum number of requests allowed per minute.
  • x-ratelimit-remaining-requests
    • The number of requests remaining in the current minute before hitting the rate limit.
  • x-ratelimit-reset-requests
    • Time in epoch time until the per-minute request quota resets.

RPH (Requests per hour):

  • x-ratelimit-limit-requests-hour
    • The maximum number of requests allowed per hour.
  • x-ratelimit-remaining-requests-hour
    • The number of requests remaining in the current hour before hitting the rate limit.
  • x-ratelimit-reset-requests-hour
    • Time in epoch time until the per-hour request quota resets.

RPD (Requests per day):

  • x-ratelimit-limit-requests-day
    • The maximum number of requests allowed per day.
  • x-ratelimit-remaining-requests-day
    • The number of requests remaining in the current day before hitting the rate limit.
  • x-ratelimit-reset-requests-day
    • Time in epoch time until the per-day request quota rese