Tidak suka iklan? Pergi Bebas Iklan Hari ini 

Rate Limiting How to Stop Getting 429’d by Every API You Touch

Diterbitkan pada Jun 8, 2026

HTTP 429 Terlalu Banyak Permintaan — cara membaca header Retry-After, menerapkan penundaan eksponensial dengan jitter, memahami algoritma token bucket versus leaky bucket, dan menerapkan pembatasan laju pada API Anda.

Rate Limiting: How to Stop Getting 429'd by Every API You Touch 1

IKLAN · HAPUS?

You hit 429. Your script has been hammering the API for the last hour, your logs are full of red, and the deployment is in 20 minutes. That’s the moment this stops being abstract.

HTTP 429 Too Many Requests means you sent more requests than the server allows in a given time window. Reading the response correctly — and retrying correctly — is a skill most developers only pick up after getting burned. Here’s the full picture.

What the 429 actually tells you

The status code is only half the message. The real information is in the headers:

Retry-After: 30 — wait 30 seconds before retrying. Can also be an HTTP-date: Retry-After: Mon, 08 Jun 2026 15:00:00 GMT
X-RateLimit-Limit: 100 — total requests allowed per window
X-RateLimit-Remaining: 0 — requests left in the current window (you’re at zero)
X-RateLimit-Reset: 1749391200 — Unix timestamp when the window resets

Not all APIs send all of these. GitHub sends the full set. Stripe sends Retry-After. Some REST APIs send nothing and expect you to guess. If you have Retry-After, use it exactly — it’s the server telling you the minimum safe wait time. If you don’t, exponential backoff is your fallback.

The wrong way to retry

The naive implementation looks like this:

async function fetchWithoutBackoff(url) {
  while (true) {
    const res = await fetch(url);
    if (res.ok) return res;
    if (res.status === 429) continue; // immediately retry
  }
}

This is actively harmful. If 10 instances of your service all hit 429 at the same moment and all immediately retry, every retry lands at the same time — the thundering herd problem. You get rate limited again, immediately, in a tight loop that can run indefinitely and make your client look like it’s intentionally abusing the API.

Exponential backoff with jitter

The correct pattern: each retry waits longer than the last (exponential), and a random offset prevents synchronized retries across multiple clients (jitter).

async function fetchWithBackoff(url, options = {}, maxRetries = 5) {
  let attempt = 0;

  while (attempt <= maxRetries) {
    const res = await fetch(url, options);

    if (res.ok) return res;

    if (res.status !== 429) {
      throw new Error(`Request failed: ${res.status}`);
    }

    if (attempt === maxRetries) {
      throw new Error(`Rate limited after ${maxRetries} retries`);
    }

    // Use Retry-After if provided; otherwise exponential backoff + jitter
    const retryAfter = res.headers.get('Retry-After');
    let waitMs;

    if (retryAfter) {
      const seconds = isNaN(retryAfter)
        ? (new Date(retryAfter) - Date.now()) / 1000  // HTTP date
        : Number(retryAfter);                          // seconds
      waitMs = seconds * 1000;
    } else {
      const baseDelay = 1000 * Math.pow(2, attempt); // 1s, 2s, 4s, 8s, 16s
      const jitter = Math.random() * 1000;            // 0–1000ms random offset
      waitMs = baseDelay + jitter;
    }

    console.log(`Rate limited. Waiting ${Math.round(waitMs / 1000)}s (attempt ${attempt + 1}/${maxRetries})`);
    await new Promise(resolve => setTimeout(resolve, waitMs));
    attempt++;
  }
}

The jitter line is the part most implementations miss. Without it, retries from multiple parallel processes still arrive in clusters. With it, they spread out across the wait window.

For APIs that return Retry-After, use that value as the floor — if you’re still getting 429s after the specified wait, apply exponential backoff on top.

Token bucket vs leaky bucket

Two algorithms dominate rate limiter implementations. Understanding which one you’re dealing with tells you a lot about how the API will behave under pressure — and which algorithm to reach for when you’re building your own.

Token bucket

The bucket holds up to N tokens. Each request costs 1 token. Tokens refill at a fixed rate (e.g. 10 per second). If the bucket is empty, the request is rejected or queued.

Burst-friendly. If you haven’t made requests in a while, you’ve accumulated tokens and can fire a burst without hitting the limit. GitHub’s API works this way — 5,000 requests per hour, but you can use them in one go if you haven’t touched the API in hours. Good for interactive use cases where traffic is spiky.

Leaky bucket

Requests go into a queue and drain out at a fixed rate, regardless of how fast they arrive. If the queue fills up, incoming requests are dropped.

Smooth output, no bursting. Even if you have quota left, requests trickle out at the configured rate. Nginx’s limit_req module uses this. Better for protecting downstream systems from spikes — useful for webhook delivery, outbound API calls, and anything where predictable throughput matters more than burst tolerance.

Which to pick when you’re implementing your own: User-facing endpoints that need burst tolerance → token bucket. Outbound webhook delivery or third-party API calls → leaky bucket. Background jobs where smooth throughput matters → leaky bucket.

Calculating safe request rates

Before writing any retry logic, figure out what you’re actually allowed to do. If an API says “1,000 requests per hour,” that’s 16.67 req/min or 0.278 req/second. Add a 20% safety buffer and you’re at ~13 req/min — enough headroom to avoid edge-case timing issues where two windows overlap.

Gunakan Rate Limit Calculator to convert quota numbers into per-second and per-minute rates, find the right sleep interval between requests, and see how your concurrency level affects burst risk.

Implementing rate limiting on your own API

If you’re on the other side and want to add proper 429 behavior to your own API:

Pick the right granularity. Per-IP is easy but breaks for services behind NAT or shared egress. Per-API-key is better but requires auth. Per-user-ID is ideal when you have it. Don’t mix granularities without knowing which one wins.
Always return Retry-After. A 429 without Retry-After forces every client to implement their own backoff heuristic. You’ll get more thundering herd, not less.
Use Redis for distributed rate limiting. In-memory counters don’t work across multiple server instances. Redis INCR + EXPIRE is the standard pattern. Libraries like rate-limiter-flexible (Node) and slowapi (Python/FastAPI) abstract this correctly.
Log every 429 you issue. A spike in 429s from a single key is either a client bug or intentional abuse. Both are worth knowing about in real time.
Don’t rate-limit on auth failure. Return 401 for bad credentials, not 429. Rate-limiting on bad auth is how you accidentally lock out your own users during a credentials rotation.

What to actually do right now

If you’re hitting 429s:

Cek Retry-After first — use it if it’s there, don’t invent your own delay
Implement exponential backoff with jitter — the code above is copy-paste ready
Log the X-RateLimit-Remaining header on every response — you might be burning quota faster than you think
Cache responses where the data doesn’t change frequently

If you’re implementing rate limiting: pick a Redis-backed library, return Retry-After on every 429, monitor the 429 rate per key, and don’t rate-limit on auth failure.

The 429 is not the enemy — it’s the API telling you exactly what went wrong and (usually) how long to wait. Most rate limit problems come down to ignoring that message and retrying immediately. Don’t do that.