What’s the difference between rate limiting and throttling?

Rate limiting cuts you off at a hard count — 5,000 requests per hour, then nothing until the window resets. Throttling slows you down gradually: the API still responds but adds latency or queues requests. Some APIs do both; Stripe throttles before hard-limiting on burst traffic.

Should I slow down proactively before hitting the rate limit?

Yes, and it’s usually better than waiting for a 429. Watch X-RateLimit-Remaining and start spacing requests when you’re within 10–20% of the limit. GitHub’s docs recommend this explicitly. The cost of a proactive sleep is a few milliseconds; the cost of a 429 mid-batch-job is potentially a full window reset.

What if the API doesn’t send any rate limit headers?

You’re flying blind. Default to conservative exponential backoff on any 429 or 5xx response, and check the documentation — many APIs document their limits only in a dedicated limits or quotas section, not in response headers. Some internal or enterprise APIs have limits that aren’t documented at all; test empirically with low concurrency first.

Can I parallelize requests and still avoid rate limits?

Yes, with a semaphore or token bucket. Instead of firing all requests simultaneously, limit concurrency to stay within the per-second limit, and use jitter-based backoff on any 429s that slip through. Libraries like asyncio.Semaphore with aiohttp (Python) or p-limit (Node.js) make this straightforward.

Don't like ads? Go Ad-Free Today 

API Rate Limiting — Headers, Exponential Backoff, and Surviving the 429

Updated on Jun 1, 2026

You hit a 429. The API is telling you to slow down. Here’s how to decode X-RateLimit-* headers, understand Retry-After, and implement exponential backoff with jitter so your integrations handle rate limits gracefully instead of hammering the server.

API Rate Limiting — Headers, Exponential Backoff, and Surviving the 429 1

ADVERTISEMENT · REMOVE?

You hit a 429. Maybe it crashed your webhook handler. Maybe it silently dropped a batch job. The API gave you “Too Many Requests” and a wall of response headers you probably scrolled past.

Those headers are the whole story. Here’s how to read them, and how to write retry logic that doesn’t make the problem worse.

The Headers That Actually Matter

Most rate-limited APIs return some variant of these on every response — not just on 429s:

X-RateLimit-Limit — your total allowed requests in the current window. GitHub’s REST API gives authenticated users 5,000/hour; unauthenticated requests get 60.
X-RateLimit-Remaining — requests left in the current window. When this hits 0, the next request returns a 429.
X-RateLimit-Reset — when the window resets, as a Unix epoch timestamp. This is the one most developers ignore, and the most useful one.
X-RateLimit-Used (GitHub-specific) — requests consumed so far. Mirrors Limit - Remaining but useful for sanity checks.
Retry-After — only appears on 429 responses. Either a number of seconds to wait, or an HTTP-date string. If the API sends it, use it — it’s more precise than anything you’d calculate yourself.

Real GitHub response headers look like this:

X-RateLimit-Limit: 5000
X-RateLimit-Remaining: 4823
X-RateLimit-Reset: 1716998400
X-RateLimit-Used: 177
X-RateLimit-Resource: core

The X-RateLimit-Resource header is GitHub-specific: they maintain separate quota pools for core REST, search, and GraphQL. Burning your search quota (capped at 30 requests/minute) doesn’t touch your core quota — and vice versa.

Stripe Is Different

Stripe doesn’t use X-RateLimit-* naming. Their headers are prefixed differently:

Stripe-Ratelimit-Limit: 100
Stripe-Ratelimit-Remaining: 97
Stripe-Ratelimit-Reset: 1716998460

And on a 429:

Retry-After: 30

Stripe’s default limit is 100 live-mode requests per second, not per hour. This matters more than it sounds: a loop importing 500 customers can exhaust that window in under 5 seconds if you’re not throttling on your end.

Stripe also distinguishes between request-rate limits and resource-specific limits (e.g., creating too many customers in a short burst). The 429 response body specifies which limit you hit — always log the full body, not just the status code.

Decoding the Reset Timestamp

The X-RateLimit-Reset value is a Unix epoch timestamp. 1716998400 tells you nothing at a glance, but it’s trivial to decode: use the Unix Timestamp Converter to convert it to a readable UTC datetime and see exactly how far away the reset is.

In code: reset_time - time.now() gives the seconds until the window resets. But check X-RateLimit-Remaining first — if you still have quota, there’s nothing to wait for.

What the 429 Body Tells You

The 429 status code alone isn’t enough. The response body usually specifies which limit was hit:

GitHub:

{
  "message": "API rate limit exceeded for user ID 12345.",
  "documentation_url": "https://docs.github.com/rest/overview/rate-limits"
}

Stripe:

{
  "error": {
    "code": "rate_limit",
    "message": "Too many requests hit the API too quickly.",
    "type": "invalid_request_error"
  }
}

OpenAI goes further: the error message specifies whether you hit a tokens-per-minute limit or a requests-per-minute limit, which changes your retry strategy entirely. Always log the full 429 body.

Exponential Backoff With Jitter

The naive fix: catch a 429, sleep 1 second, retry. This fails for two reasons:

If you have multiple workers hitting the same endpoint, they’ll all sleep for 1 second and retry simultaneously — a synchronized retry storm that recreates the problem.
1 second is useless if you’ve exhausted a per-hour or per-day quota. You’ll just collect 3,600 more 429s.

The correct approach is exponential backoff with jitter: each retry waits longer than the last, with a random component to desynchronize concurrent workers.

import time
import random
import requests

def fetch_with_backoff(url, headers, max_retries=5):
    base_delay = 1  # seconds

    for attempt in range(max_retries):
        response = requests.get(url, headers=headers)

        if response.status_code != 429:
            return response

        # Prefer Retry-After if the API provides it
        retry_after = response.headers.get("Retry-After")
        if retry_after:
            wait = int(retry_after)
        else:
            # Fall back to X-RateLimit-Reset
            reset = response.headers.get("X-RateLimit-Reset")
            if reset:
                wait = max(0, int(reset) - int(time.time()))
            else:
                # Pure exponential backoff with full jitter
                cap = 60  # max wait: 60s
                wait = random.uniform(0, min(cap, base_delay * (2 ** attempt)))

        print(f"Rate limited. Attempt {attempt + 1}/{max_retries}. Waiting {wait:.1f}s")
        time.sleep(wait)

    raise Exception(f"Max retries exceeded after {max_retries} attempts")

The priority order in this implementation is deliberate:

Retry-After first — if the API tells you exactly how long to wait, use it. Don’t second-guess it with your own calculation.
X-RateLimit-Reset as fallback — calculate actual seconds until reset rather than guessing a fixed delay.
Full jitter as last resort — random.uniform(0, cap) spreads retries across the entire backoff window. AWS’s architecture blog documents this as “full jitter” and shows it measurably reduces server-side collision compared to equal jitter or no jitter at all.
max(0, ...) on reset — the reset timestamp can be in the past by the time you do the math. Guard against a negative sleep value crashing your handler.

Common Mistakes

Treating non-429 errors as rate limit errors. A 503 is a server error. A 401 means your credentials are wrong. Check status_code == 429 explicitly before applying rate-limit retry logic.

Swallowing the 429 and returning empty data. Silent failures are harder to debug than raised exceptions. Surface the error.

Using a fixed delay. If you’ve exhausted a per-hour window with 47 minutes left on the clock, sleeping 5 seconds buys you nothing. Calculate from the reset timestamp.

Retrying indefinitely. Set a max_retries cap and raise after it’s exhausted. Some 429s indicate quota exhaustion that won’t recover until the next billing period — an unbounded retry loop is a bug.

Not watching X-RateLimit-Remaining proactively. If Remaining drops below 10% of Limit, start spacing out requests before you hit zero. Most SDKs don’t do this automatically. The cost is a few milliseconds of extra latency; the benefit is never seeing a 429 in the first place.

Wrapping Up

The 429 isn’t a one-time problem you fix and forget. It’s a recurring constraint, and ignoring the headers that come with it means you’ll keep hitting the same wall. Use Retry-After when the API provides it. Calculate from X-RateLimit-Reset when it doesn’t. Add jitter so retries don’t synchronize. Set a cap so unbounded retry loops don’t become production incidents.

And when you’re staring at X-RateLimit-Reset: 1716998400 and wondering when that actually is — the Unix Timestamp Converter will tell you in one click.