New rate limit model for the Ory Network
Say goodbye to flat rate limits. Ory Network’s new bucket-based model is tailored to your actual usage with built-in headroom.
Say goodbye to flat rate limits. Ory Network’s new bucket-based model is tailored to your actual usage with built-in headroom.
We've replaced the rate limit system on the Ory Network. The new model rolls out in phases starting the week of June 15.
Rate limits used to be flat. Identity admin calls, session checks, OAuth2 flows, and permission checks all drew from the same pool. Run a batch identity import, and it eats into your session validation limit. OAuth2 token requests spike, and your permission checks get crowded out. Those operations have nothing to do with each other.
The flat model was also a blunt instrument on our end. When malicious traffic spiked on one endpoint, the only lever we had was a broad limit that caught legitimate requests alongside the bad ones.
The old published ceilings weren’t reflections of what customers actually used. When you break the rate budget into per-bucket limits sized to real traffic, the numbers come down. But the limits that matter to you (the endpoints you actually call) are now sized to handle your actual usage with headroom to spare. For paid customers whose traffic in specific buckets exceeds the new base, we've already created per-customer exceptions. More on that below.
API operations are now grouped into separate buckets by service, access level, and cost. Each bucket has its own limits. A burst in one bucket has zero effect on the others.
Your GET /sessions/whoami calls have their own limit. Your POST /admin/identities calls have a separate one. Your POST /oauth2/token calls have another. They don't interfere.
Each bucket enforces two thresholds:
Limits vary by subscription tier and project environment (Production, Staging, Development). Dev and staging projects on paid workspaces have always had limits separate from production, but they were previously pinned to Free-tier levels regardless of your plan. Now they get tier-appropriate limits, so a Growth workspace's staging project can get Growth-level headroom instead of Free-tier constraints.
We monitored five months of production traffic across the entire Ory Network (November 2025 through April 2026). For every tier and bucket combination we doubled the observed P95 usage and added a 25% buffer on top.
For Growth and Enterprise customers whose traffic in specific buckets consistently exceeded the new base, we've already created per-customer exceptions sized to their actual peak with a buffer on top. No paying customer should hit 429s at rollout.
Buckets follow a {service}-{access}-{threshold} naming pattern, where the suffix reflects the rate limit level: -high means a high allowance (cheap, frequent operations), -lowmeans a low allowance (expensive, less frequent operations). YourGET /sessions/whoamicalls, yourPOST /admin/identitiescalls, and yourPOST /oauth2/token` calls each live in their own bucket with their own limits.
The complete endpoint-to-bucket mapping and the full threshold tables for every tier and environment (including burst RPS and dev/staging limits) are in our rate limits documentation.
Every API response now includes rate limit headers following the IETF RateLimit header fields draft:
x-ratelimit-limit: 10, 10;w=1, 300;w=60
x-ratelimit-remaining: 8
x-ratelimit-reset: 1
w=1 is the 1-second burst window; w=60 is the 60-second sustained window. x-ratelimit-remaining tells you how many requests are left. x-ratelimit-reset tells you when the window resets. Use these to throttle proactively.
When you get a 429 Too Many Requests, back off. But do it properly. The x-ratelimit-reset header tells you exactly how long to wait. Use it when it's there, fall back to exponential backoff when it's not, and always add jitter so your retries don't pile up at the same instant.
async function callWithBackoff(request, maxRetries = 5) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
const response = await fetch(request);
if (response.status !== 429) return response;
// Use the server's reset header when available
const resetAfter = response.headers.get('x-ratelimit-reset');
const baseDelay = resetAfter
? parseInt(resetAfter, 10) * 1000
: Math.min(Math.pow(2, attempt) * 1000, 30000); // cap at 30s
// Add jitter to avoid thundering herd
const jitter = Math.random() * 1000;
await new Promise(r => setTimeout(r, baseDelay + jitter));
}
throw new Error('Max retries exceeded');
}
You can also throttle proactively using the x-ratelimit-remaining header to slow down before you ever hit a 429:
async function callWithThrottle(request) {
const response = await fetch(request);
const remaining = parseInt(response.headers.get('x-ratelimit-remaining'), 10);
const resetIn = parseInt(response.headers.get('x-ratelimit-reset'), 10);
// If you're running low on budget, space out your next calls
if (remaining < 5 && resetIn > 0) {
const paceDelay = (resetIn * 1000) / Math.max(remaining, 1);
await new Promise(r => setTimeout(r, paceDelay));
}
return response;
}
Clients that repeatedly hit limits without reducing the number of calls may have their API access temporarily blocked.
Independent of the per-bucket project limits, we also enforce endpoint-level protections against volumetric attacks. These analyze request patterns based on IP address, JA3/JA4 fingerprint, request frequency, and authentication status. They're designed to catch brute-force and credential stuffing attacks without affecting normal API usage.
You don't need to do anything about these. They operate transparently and only kick in when traffic looks malicious.
Separate from the per-bucket limits: we now enforce concurrent request limits on critical write endpoints. For example, two requests trying to PUT /admin/identities/{id} at the same time with the same UUID; The second one will get a 429. There's no legitimate reason to edit the same identity concurrently; the result would be undefined regardless.
Enforced (returns 429 on concurrent requests):
/admin/identities/admin/identities/{id}/admin/identities/{id}/credentials/{type}/admin/identities/{id}/sessionsReport-only (logged but not blocked yet):
/admin/sessions/{id}/admin/sessions/{id}/extend/self-service/recoveryThe report-only endpoints are being monitored. If we see patterns that warrant enforcement, we'll promote them to enforced with advance notice.
| When | Who |
|---|---|
| Week of June 15, 2026 | All new workspaces + existing Developer workspaces |
| Week of June 22, 2026 | Existing Production workspaces |
| Week of June 29, 2026 | Existing Growth workspaces |
| Week of July 6, 2026 | Existing Enterprise workspaces |
Each tier gets a week of buffer after the previous one. Migration is automatic; no action required.
Load testing against the Ory Network requires prior written approval. Unauthorized tests will be detected and may result in temporary blocking. Enterprise customers can request an approved window through our support team (see our Load Testing Policy).
Full threshold tables and technical reference: rate limits documentation.