Rate limits help manage how often users can call our API within set time intervals. By enforcing these limits, we ensure stable performance, equitable access, and maximum reliability — enabling us to deliver the fastest, highest quality inference for all users.

Overview

Rate limits are measured in:
  • RPM: Requests per minute
  • RPD: Requests per day
Basics
  • A request is defined by a call to our API
  • You can hit either limit type (RPM or RPD) depending on which one you reach first
  • You will be notified in every request response what the status of your rate limits are (see rate limit response headers for more information)
  • If you hit a rate limit, you will be sent an error message in your response

Rate limit tiers

There are a few different rate limit tier offerings we provide:
  • Free Tier: Applied when there is no payment method linked with your account
  • Developer Tier: Applied when a payment method is linked with your account
  • Enterprise Tier: Please contact our sales team for our enterprise tier rate limit plans or access to a dedicated node for maximum rate limits
Please see the Billing page to link a payment method to your account.
Below are our Developer Tier and Free Tier rate limits. Production models are intended for use in production environments and meet our high standards for speed and quality.
DeveloperModel IDRequests per minute (RPM)Requests per day (RPD)
DeepSeek
DeepSeek-R1303000
DeepSeek-R1-Distill-Llama-70B8012000
DeepSeek-V3-0324303000
Meta
Meta-Llama-3.3-70B-Instruct12012000
Meta-Llama-3.1-8B-Instruct48072000

Preview models

Preview models are intended for evaluation purposes and developer experimentation only, and should not be used in production environments. These models have limited capacity and may be removed at short notice.
DeveloperModel IDRequests per minute (RPM)Requests per day (RPD)
Meta
Llama-4-Maverick-17B-128E-Instruct406000
OpenAI
Whisper-Large-v330045000
Qwen
Qwen3-32B203000
Tokyotech-llm
Llama-3.3-Swallow-70B-Instruct-v0.4406000
Other
E5-Mistral-7B-Instruct304500

Rate limit response headers

These headers are found in every request response and give information about the current status of rate limit usage. RPM (Requests per minute):
  • x-ratelimit-limit-requests
    • The maximum number of requests allowed per minute.
  • x-ratelimit-remaining-requests
    • The number of requests remaining in the current minute before hitting the rate limit.
  • x-ratelimit-reset-requests
    • Time in epoch time until the per-minute request quota resets.
RPD (Requests per day):
  • x-ratelimit-limit-requests-day
    • The maximum number of requests allowed per day.
  • x-ratelimit-remaining-requests-day
    • The number of requests remaining in the current day before hitting the rate limit.
  • x-ratelimit-reset-requests-day
    • Time in epoch time until the per-day request quota resets.