New Rate Limit Model for the Ory Network

We've replaced the rate limit system on the Ory Network. The new model rolls out in phases starting the week of June 15.

Why we are we changing the rate limits

Rate limits used to be flat. Identity admin calls, session checks, OAuth2 flows, and permission checks all drew from the same pool. Run a batch identity import, and it eats into your session validation limit. OAuth2 token requests spike, and your permission checks get crowded out. Those operations have nothing to do with each other.

The flat model was also a blunt instrument on our end. When malicious traffic spiked on one endpoint, the only lever we had was a broad limit that caught legitimate requests alongside the bad ones.

The old published ceilings weren’t reflections of what customers actually used. When you break the rate budget into per-bucket limits sized to real traffic, the numbers come down. But the limits that matter to you (the endpoints you actually call) are now sized to handle your actual usage with headroom to spare. For paid customers whose traffic in specific buckets exceeds the new base, we've already created per-customer exceptions. More on that below.

How the new rate-limit model works

API operations are now grouped into separate buckets by service, access level, and cost. Each bucket has its own limits. A burst in one bucket has zero effect on the others.

Your GET /sessions/whoami calls have their own limit. Your POST /admin/identities calls have a separate one. Your POST /oauth2/token calls have another. They don't interfere.

Each bucket enforces two thresholds:

Burst: max requests per second, for short spikes
Sustained: max requests per minute, for steady throughput

Limits vary by subscription tier and project environment (Production, Staging, Development). Dev and staging projects on paid workspaces have always had limits separate from production, but they were previously pinned to Free-tier levels regardless of your plan. Now they get tier-appropriate limits, so a Growth workspace's staging project can get Growth-level headroom instead of Free-tier constraints.

How we sized the limits

We monitored five months of production traffic across the entire Ory Network (November 2025 through April 2026). For every tier and bucket combination we doubled the observed P95 usage and added a 25% buffer on top.

For Growth and Enterprise customers whose traffic in specific buckets consistently exceeded the new base, we've already created per-customer exceptions sized to their actual peak with a buffer on top. No paying customer should hit 429s at rollout.

Buckets and thresholds

Buckets follow a {service}-{access}-{threshold} naming pattern, where the suffix reflects the rate limit level: high means a high allowance (cheap, frequent operations), low means a low allowance (expensive, less frequent operations). YourGET /sessions/whoamicalls, yourPOST /admin/identitiescalls, and yourPOST /oauth2/token calls each live in their own bucket with their own limits.

The complete endpoint-to-bucket mapping and the full threshold tables for every tier and environment (including burst RPS and dev/staging limits) are in our rate limits documentation.

Rate limit response headers

Every API response now includes rate limit headers following the IETF RateLimit header fields draft:

x-ratelimit-limit: 10, 10;w=1, 300;w=60
x-ratelimit-remaining: 8
x-ratelimit-reset: 1

w=1 is the 1-second burst window; w=60 is the 60-second sustained window. x-ratelimit-remaining tells you how many requests are left. x-ratelimit-reset tells you when the window resets. Use these to throttle proactively.

Handling 429s

When you get a 429 Too Many Requests, back off. But do it properly. The x-ratelimit-reset header tells you exactly how long to wait. Use it when it's there, fall back to exponential backoff when it's not, and always add jitter so your retries don't pile up at the same instant.

async function callWithBackoff(request, maxRetries = 5) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    const response = await fetch(request);
    if (response.status !== 429) return response;

    // Use the server's reset header when available
    const resetAfter = response.headers.get('x-ratelimit-reset');
    const baseDelay = resetAfter
      ? parseInt(resetAfter, 10) * 1000
      : Math.min(Math.pow(2, attempt) * 1000, 30000); // cap at 30s

    // Add jitter to avoid thundering herd
    const jitter = Math.random() * 1000;
    await new Promise(r => setTimeout(r, baseDelay + jitter));
  }
  throw new Error('Max retries exceeded');
}

You can also throttle proactively using the x-ratelimit-remaining header to slow down before you ever hit a 429:

async function callWithThrottle(request) {
  const response = await fetch(request);
  const remaining = parseInt(response.headers.get('x-ratelimit-remaining'), 10);
  const resetIn = parseInt(response.headers.get('x-ratelimit-reset'), 10);

  // If you're running low on budget, space out your next calls
  if (remaining < 5 && resetIn > 0) {
    const paceDelay = (resetIn * 1000) / Math.max(remaining, 1);
    await new Promise(r => setTimeout(r, paceDelay));
  }
  return response;
}

Clients that repeatedly hit limits without reducing the number of calls may have their API access temporarily blocked.

Endpoint-based rate limits

Independent of the per-bucket project limits, we also enforce endpoint-level protections against volumetric attacks. These analyze request patterns based on IP address, JA3/JA4 fingerprint, request frequency, and authentication status. They're designed to catch brute-force and credential stuffing attacks without affecting normal API usage.

You don't need to do anything about these. They operate transparently and only kick in when traffic looks malicious.

Inflight rate limits on write endpoints

Separate from the per-bucket limits: we now enforce concurrent request limits on critical write endpoints. For example, two requests trying to PUT /admin/identities/{id} at the same time with the same UUID; The second one will get a 429. There's no legitimate reason to edit the same identity concurrently; the result would be undefined regardless.

Enforced (returns 429 on concurrent requests):

POST, PATCH on /admin/identities
PUT, PATCH, DELETE on /admin/identities/{id}
DELETE on /admin/identities/{id}/credentials/{type}
DELETE on /admin/identities/{id}/sessions

Report-only (logged but not blocked yet):

DELETE on /admin/sessions/{id}
PATCH on /admin/sessions/{id}/extend
POST on /self-service/recovery

The report-only endpoints are being monitored. If we see patterns that warrant enforcement, we'll promote them to enforced with advance notice.

Rollout (When new rate-limiting goes into affect)

When	Who
Week of June 15, 2026	All new workspaces + existing Developer workspaces
Week of June 22, 2026	Existing Production workspaces
Week of June 29, 2026	Existing Growth workspaces
Week of July 6, 2026	Existing Enterprise workspaces

Each tier gets a week of buffer after the previous one. Migration is automatic; no action required.

Load testing

Load testing against the Ory Network requires prior written approval. Unauthorized tests will be detected and may result in temporary blocking. Enterprise customers can request an approved window through our support team (see our Load Testing Policy).

Questions

Enterprise: reach out to your CSM or email [email protected]
Growth: email [email protected]
Developer and Production: post in our community Slack at slack.ory.com

Full threshold tables and technical reference: rate limits documentation.

New rate limit model for the Ory Network

Why we are we changing the rate limits

How the new rate-limit model works

How we sized the limits

Buckets and thresholds

Rate limit response headers

Handling 429s

Endpoint-based rate limits

Inflight rate limits on write endpoints

Rollout (When new rate-limiting goes into affect)

Load testing

Questions

Further reading

Hugging Face's AI agent breach: The Identity gap explained

How a redirect broke login with Apple for a full day

Why we are we changing the rate limits

How the new rate-limit model works

How we sized the limits

Buckets and thresholds

Rate limit response headers

Handling 429s

Endpoint-based rate limits

Inflight rate limits on write endpoints

Rollout (When new rate-limiting goes into affect)

Load testing

Questions