API Rate Limiting — Headers, Exponential Backoff, and Surviving the 429
You hit a 429. The API is telling you to slow down. Here’s how to decode X-RateLimit-* headers, understand Retry-After, and implement exponential backoff with jitter so your integrations handle rate limits gracefully instead of hammering the server.
You hit a 429. Maybe it crashed your webhook handler. Maybe it silently dropped a batch job. The API gave you “Too Many Requests” and a wall of response headers you probably scrolled past.
Those headers are the whole story. Here’s how to read them, and how to write retry logic that doesn’t make the problem worse.
The Headers That Actually Matter
Most rate-limited APIs return some variant of these on every response — not just on 429s:
- X-RateLimit-Limit — your total allowed requests in the current window. GitHub’s REST API gives authenticated users 5,000/hour; unauthenticated requests get 60.
- X-RateLimit-Remaining — requests left in the current window. When this hits 0, the next request returns a 429.
- X-RateLimit-Reset — when the window resets, as a Unix epoch timestamp. This is the one most developers ignore, and the most useful one.
- X-RateLimit-Used (GitHub-specific) — requests consumed so far. Mirrors
Limit - Remainingbut useful for sanity checks. - Retry-After — only appears on 429 responses. Either a number of seconds to wait, or an HTTP-date string. If the API sends it, use it — it’s more precise than anything you’d calculate yourself.
Real GitHub response headers look like this:
X-RateLimit-Limit: 5000
X-RateLimit-Remaining: 4823
X-RateLimit-Reset: 1716998400
X-RateLimit-Used: 177
X-RateLimit-Resource: core
The X-RateLimit-Resource header is GitHub-specific: they maintain separate quota pools for core REST, search, and GraphQL. Burning your search quota (capped at 30 requests/minute) doesn’t touch your core quota — and vice versa.
Stripe Is Different
Stripe doesn’t use X-RateLimit-* naming. Their headers are prefixed differently:
Stripe-Ratelimit-Limit: 100
Stripe-Ratelimit-Remaining: 97
Stripe-Ratelimit-Reset: 1716998460
And on a 429:
Retry-After: 30
Stripe’s default limit is 100 live-mode requests per second, not per hour. This matters more than it sounds: a loop importing 500 customers can exhaust that window in under 5 seconds if you’re not throttling on your end.
Stripe also distinguishes between request-rate limits and resource-specific limits (e.g., creating too many customers in a short burst). The 429 response body specifies which limit you hit — always log the full body, not just the status code.
Decoding the Reset Timestamp
The X-RateLimit-Reset value is a Unix epoch timestamp. 1716998400 tells you nothing at a glance, but it’s trivial to decode: use the Unix Timestamp Converter to convert it to a readable UTC datetime and see exactly how far away the reset is.
In code: reset_time - time.now() gives the seconds until the window resets. But check X-RateLimit-Remaining first — if you still have quota, there’s nothing to wait for.
What the 429 Body Tells You
The 429 status code alone isn’t enough. The response body usually specifies which limit was hit:
GitHub:
{
"message": "API rate limit exceeded for user ID 12345.",
"documentation_url": "https://docs.github.com/rest/overview/rate-limits"
}
Stripe:
{
"error": {
"code": "rate_limit",
"message": "Too many requests hit the API too quickly.",
"type": "invalid_request_error"
}
}
OpenAI goes further: the error message specifies whether you hit a tokens-per-minute limit or a requests-per-minute limit, which changes your retry strategy entirely. Always log the full 429 body.
Exponential Backoff With Jitter
The naive fix: catch a 429, sleep 1 second, retry. This fails for two reasons:
- If you have multiple workers hitting the same endpoint, they’ll all sleep for 1 second and retry simultaneously — a synchronized retry storm that recreates the problem.
- 1 second is useless if you’ve exhausted a per-hour or per-day quota. You’ll just collect 3,600 more 429s.
The correct approach is exponential backoff with jitter: each retry waits longer than the last, with a random component to desynchronize concurrent workers.
import time
import random
import requests
def fetch_with_backoff(url, headers, max_retries=5):
base_delay = 1 # seconds
for attempt in range(max_retries):
response = requests.get(url, headers=headers)
if response.status_code != 429:
return response
# Prefer Retry-After if the API provides it
retry_after = response.headers.get("Retry-After")
if retry_after:
wait = int(retry_after)
else:
# Fall back to X-RateLimit-Reset
reset = response.headers.get("X-RateLimit-Reset")
if reset:
wait = max(0, int(reset) - int(time.time()))
else:
# Pure exponential backoff with full jitter
cap = 60 # max wait: 60s
wait = random.uniform(0, min(cap, base_delay * (2 ** attempt)))
print(f"Rate limited. Attempt {attempt + 1}/{max_retries}. Waiting {wait:.1f}s")
time.sleep(wait)
raise Exception(f"Max retries exceeded after {max_retries} attempts")
The priority order in this implementation is deliberate:
- Retry-After first — if the API tells you exactly how long to wait, use it. Don’t second-guess it with your own calculation.
- X-RateLimit-Reset as fallback — calculate actual seconds until reset rather than guessing a fixed delay.
- Full jitter as last resort —
random.uniform(0, cap)spreads retries across the entire backoff window. AWS’s architecture blog documents this as “full jitter” and shows it measurably reduces server-side collision compared to equal jitter or no jitter at all. max(0, ...)on reset — the reset timestamp can be in the past by the time you do the math. Guard against a negative sleep value crashing your handler.
Common Mistakes
Treating non-429 errors as rate limit errors. A 503 is a server error. A 401 means your credentials are wrong. Check status_code == 429 explicitly before applying rate-limit retry logic.
Swallowing the 429 and returning empty data. Silent failures are harder to debug than raised exceptions. Surface the error.
Using a fixed delay. If you’ve exhausted a per-hour window with 47 minutes left on the clock, sleeping 5 seconds buys you nothing. Calculate from the reset timestamp.
Retrying indefinitely. Set a max_retries cap and raise after it’s exhausted. Some 429s indicate quota exhaustion that won’t recover until the next billing period — an unbounded retry loop is a bug.
Not watching X-RateLimit-Remaining proactively. If Remaining drops below 10% of Limit, start spacing out requests before you hit zero. Most SDKs don’t do this automatically. The cost is a few milliseconds of extra latency; the benefit is never seeing a 429 in the first place.
Wrapping Up
The 429 isn’t a one-time problem you fix and forget. It’s a recurring constraint, and ignoring the headers that come with it means you’ll keep hitting the same wall. Use Retry-After when the API provides it. Calculate from X-RateLimit-Reset when it doesn’t. Add jitter so retries don’t synchronize. Set a cap so unbounded retry loops don’t become production incidents.
And when you’re staring at X-RateLimit-Reset: 1716998400 and wondering when that actually is — the Unix Timestamp Converter will tell you in one click.
Install Our Extensions
Add IO tools to your favorite browser for instant access and faster searching
恵 Scoreboard Has Arrived!
Scoreboard is a fun way to keep track of your games, all data is stored in your browser. More features are coming soon!
Must-Try Tools
View All New Arrivals
View AllUpdate: Our latest tool was added on Jun 1, 2026
