Rate Limiting How to Stop Getting 429’d by Every API You Touch
HTTP 429 Terlalu Banyak Permintaan — cara membaca header Retry-After, menerapkan penundaan eksponensial dengan jitter, memahami algoritma token bucket versus leaky bucket, dan menerapkan pembatasan laju pada API Anda.
You hit 429. Your script has been hammering the API for the last hour, your logs are full of red, and the deployment is in 20 minutes. That’s the moment this stops being abstract.
HTTP 429 Too Many Requests means you sent more requests than the server allows in a given time window. Reading the response correctly — and retrying correctly — is a skill most developers only pick up after getting burned. Here’s the full picture.
What the 429 actually tells you
The status code is only half the message. The real information is in the headers:
Retry-After: 30— wait 30 seconds before retrying. Can also be an HTTP-date:Retry-After: Mon, 08 Jun 2026 15:00:00 GMTX-RateLimit-Limit: 100— total requests allowed per windowX-RateLimit-Remaining: 0— requests left in the current window (you’re at zero)X-RateLimit-Reset: 1749391200— Unix timestamp when the window resets
Not all APIs send all of these. GitHub sends the full set. Stripe sends Retry-After. Some REST APIs send nothing and expect you to guess. If you have Retry-After, use it exactly — it’s the server telling you the minimum safe wait time. If you don’t, exponential backoff is your fallback.
The wrong way to retry
The naive implementation looks like this:
async function fetchWithoutBackoff(url) {
while (true) {
const res = await fetch(url);
if (res.ok) return res;
if (res.status === 429) continue; // immediately retry
}
}
This is actively harmful. If 10 instances of your service all hit 429 at the same moment and all immediately retry, every retry lands at the same time — the thundering herd problem. You get rate limited again, immediately, in a tight loop that can run indefinitely and make your client look like it’s intentionally abusing the API.
Exponential backoff with jitter
The correct pattern: each retry waits longer than the last (exponential), and a random offset prevents synchronized retries across multiple clients (jitter).
async function fetchWithBackoff(url, options = {}, maxRetries = 5) {
let attempt = 0;
while (attempt <= maxRetries) {
const res = await fetch(url, options);
if (res.ok) return res;
if (res.status !== 429) {
throw new Error(`Request failed: ${res.status}`);
}
if (attempt === maxRetries) {
throw new Error(`Rate limited after ${maxRetries} retries`);
}
// Use Retry-After if provided; otherwise exponential backoff + jitter
const retryAfter = res.headers.get('Retry-After');
let waitMs;
if (retryAfter) {
const seconds = isNaN(retryAfter)
? (new Date(retryAfter) - Date.now()) / 1000 // HTTP date
: Number(retryAfter); // seconds
waitMs = seconds * 1000;
} else {
const baseDelay = 1000 * Math.pow(2, attempt); // 1s, 2s, 4s, 8s, 16s
const jitter = Math.random() * 1000; // 0–1000ms random offset
waitMs = baseDelay + jitter;
}
console.log(`Rate limited. Waiting ${Math.round(waitMs / 1000)}s (attempt ${attempt + 1}/${maxRetries})`);
await new Promise(resolve => setTimeout(resolve, waitMs));
attempt++;
}
}
The jitter line is the part most implementations miss. Without it, retries from multiple parallel processes still arrive in clusters. With it, they spread out across the wait window.
For APIs that return Retry-After, use that value as the floor — if you’re still getting 429s after the specified wait, apply exponential backoff on top.
Token bucket vs leaky bucket
Two algorithms dominate rate limiter implementations. Understanding which one you’re dealing with tells you a lot about how the API will behave under pressure — and which algorithm to reach for when you’re building your own.
Token bucket
The bucket holds up to N tokens. Each request costs 1 token. Tokens refill at a fixed rate (e.g. 10 per second). If the bucket is empty, the request is rejected or queued.
Burst-friendly. If you haven’t made requests in a while, you’ve accumulated tokens and can fire a burst without hitting the limit. GitHub’s API works this way — 5,000 requests per hour, but you can use them in one go if you haven’t touched the API in hours. Good for interactive use cases where traffic is spiky.
Leaky bucket
Requests go into a queue and drain out at a fixed rate, regardless of how fast they arrive. If the queue fills up, incoming requests are dropped.
Smooth output, no bursting. Even if you have quota left, requests trickle out at the configured rate. Nginx’s limit_req module uses this. Better for protecting downstream systems from spikes — useful for webhook delivery, outbound API calls, and anything where predictable throughput matters more than burst tolerance.
Which to pick when you’re implementing your own: User-facing endpoints that need burst tolerance → token bucket. Outbound webhook delivery or third-party API calls → leaky bucket. Background jobs where smooth throughput matters → leaky bucket.
Calculating safe request rates
Before writing any retry logic, figure out what you’re actually allowed to do. If an API says “1,000 requests per hour,” that’s 16.67 req/min or 0.278 req/second. Add a 20% safety buffer and you’re at ~13 req/min — enough headroom to avoid edge-case timing issues where two windows overlap.
Gunakan Rate Limit Calculator to convert quota numbers into per-second and per-minute rates, find the right sleep interval between requests, and see how your concurrency level affects burst risk.
Implementing rate limiting on your own API
If you’re on the other side and want to add proper 429 behavior to your own API:
- Pick the right granularity. Per-IP is easy but breaks for services behind NAT or shared egress. Per-API-key is better but requires auth. Per-user-ID is ideal when you have it. Don’t mix granularities without knowing which one wins.
- Always return
Retry-After. A 429 withoutRetry-Afterforces every client to implement their own backoff heuristic. You’ll get more thundering herd, not less. - Use Redis for distributed rate limiting. In-memory counters don’t work across multiple server instances. Redis
INCR+EXPIREis the standard pattern. Libraries like rate-limiter-flexible (Node) and slowapi (Python/FastAPI) abstract this correctly. - Log every 429 you issue. A spike in 429s from a single key is either a client bug or intentional abuse. Both are worth knowing about in real time.
- Don’t rate-limit on auth failure. Return 401 for bad credentials, not 429. Rate-limiting on bad auth is how you accidentally lock out your own users during a credentials rotation.
What to actually do right now
If you’re hitting 429s:
- Cek
Retry-Afterfirst — use it if it’s there, don’t invent your own delay - Implement exponential backoff with jitter — the code above is copy-paste ready
- Log the
X-RateLimit-Remainingheader on every response — you might be burning quota faster than you think - Cache responses where the data doesn’t change frequently
If you’re implementing rate limiting: pick a Redis-backed library, return Retry-After on every 429, monitor the 429 rate per key, and don’t rate-limit on auth failure.
The 429 is not the enemy — it’s the API telling you exactly what went wrong and (usually) how long to wait. Most rate limit problems come down to ignoring that message and retrying immediately. Don’t do that.
Anda mungkin juga menyukai
Instal Ekstensi Kami
Tambahkan alat IO ke browser favorit Anda untuk akses instan dan pencarian lebih cepat
恵 Papan Skor Telah Tiba!
Papan Skor adalah cara yang menyenangkan untuk melacak permainan Anda, semua data disimpan di browser Anda. Lebih banyak fitur akan segera hadir!
Alat Wajib Coba
Lihat semua Pendatang baru
Lihat semuaMemperbarui: Kita alat terbaru ditambahkan pada 7 Juni 2026
