Anúncios incomodam? Ir Sem anúncios Hoje

Database Connection Pooling Why Your App Crashes at 100 Users When Queries Take 10ms

Atualizado em

Your queries run in 10ms but the app falls over at 100 concurrent users. Here's why max_connections=100 hits faster than you think, what connection pooling actually does, how to size your pool with math instead of magic numbers, and why serverless functions are the worst-case scenario for Postgres.

Database Connection Pooling: Why Your App Crashes at 100 Users When Queries Take 10ms 1
ANUNCIADO Remover?

Your queries run in 10ms. You’ve checked. You’ve profiled. Everything looks fine. Then 100 users show up simultaneously and the app goes dead. The error logs say something about “too many connections” or “connection refused” and suddenly you’re reading Postgres documentation at 2am.

Here’s what happened: you hit max_connections=100. Not because your queries are slow. Because each user borrowed a connection and didn’t give it back fast enough.

What a Postgres Connection Actually Costs

Every connection to Postgres is a forked OS process. Not a thread. Not a lightweight coroutine. A full process. On Linux that means roughly 5–10 MB of memory per connection — before your query touches any data.

A server with 1 GB of RAM and max_connections=100 can burn through 500 MB–1 GB just in connection overhead. Add working memory (work_mem), shared buffers, and the query plan itself, and you’re often out of headroom before the first row is fetched.

The default max_connections=100 in Postgres isn’t a suggestion — it’s a hard cap. Connection 101 gets rejected. Not queued. Rejected. Your ORM throws an exception, your users get a 500, and your on-call phone lights up.

The Pool States Nobody Draws Out

A connection pool sits between your app and the database. At any moment, every connection in the pool is in one of three states:

StateComo corrigirBehavior at saturation
IdleConnection is open and waiting in the pool — no active queryHealthy. Gets assigned to the next request immediately.
AtivoConnection is running a query right nowUnavailable until the query finishes and the connection is returned to the pool.
WaitingA request needs a connection but none are freeQueued. Sits here until a timeout fires or an Active connection finishes. When all pool_size connections are Active, every new request lands here.

When every connection is Active and a new request arrives, it waits. If it waits too long — past your pool’s checkout_timeout — the request fails with a pool timeout error. That’s the crash. Your queries aren’t slow. The pool is exhausted.

Pool Sizing: There’s Math for This

Most tutorials say “set pool size to 10” or “try 20” without explaining why. There’s actual math: Little’s Law.

pool_size = requests_per_second × avg_query_duration_seconds

If your app handles 500 requests per second and the average query holds a connection for 10ms (0.010s):

pool_size = 500 × 0.010 = 5 connections

Five. Not 100. You’re holding each connection for 10ms, so at steady state you only need 5 simultaneously. Add a buffer for spikes — roughly 2x — and you’re looking at 10 connections for a 500 req/s service with 10ms queries.

The counterintuitive part: a larger pool doesn’t always mean better throughput. Postgres has to schedule all those concurrent queries. Too many connections competing for the same indexes and shared buffers can make everything slower. The Postgres team’s own documentation recommends keeping application connection counts well below max_connections — often 20–50 for typical workloads.

Where the math breaks: if your queries have high variance (some take 10ms, some take 2 seconds on cache misses or table scans), your pool needs headroom for those slow outliers — or they’ll hold connections for seconds and starve out the fast queries behind them.

Idle Timeout: Why You Can’t Just Set a Giant Pool

Every idle connection in a pool is a Postgres process that’s running, using memory, waiting. If you set pool_size=100 and only use 5 at a time, you’re burning 95 Postgres processes doing nothing. That’s roughly 500 MB of memory on standby.

The fix is idle timeout — connections unused for N seconds get closed and removed from the pool. The pool shrinks during quiet periods and grows when traffic spikes, up to pool_size. Most pool libraries support this:

  • pg (Node.js): idleTimeoutMillis — default 10,000ms (10 seconds, often too short for bursty traffic)
  • SQLAlchemy (Python): pool_recycle — max age in seconds before a connection is replaced
  • database/sql (Go): db.SetConnMaxIdleTime()
  • HikariCP (Java): idleTimeout — defaults to 10 minutes

Set it too short and you’re creating and destroying connections constantly, paying the TCP handshake and auth overhead on every request. Set it too long and idle connections accumulate. Somewhere between 1–10 minutes is usually right for most applications.

PgBouncer: When Your App Can’t Pool Itself

Some tech stacks can’t do connection pooling properly. PHP running in FastCGI mode, for example — each PHP process is stateless and dies after the request. No persistent pool. Each request opens a new Postgres connection and closes it when done. At 200 concurrent PHP workers, that’s 200 connections, even if only 5 are running queries at any given moment.

PgBouncer sits in front of Postgres and acts as a shared pool for all your app processes. Your 200 PHP workers each think they have a direct Postgres connection, but they’re all sharing 10 real connections through PgBouncer. The pool lives at the infrastructure level rather than in the application.

PgBouncer has three modes — and choosing wrong will break your app silently:

  • Session mode: One real Postgres connection per client session. No multiplexing. Nearly identical to a direct connection — you get PgBouncer’s connection limiting but not its pooling benefits.
  • Transaction mode: Connection returned to the pool after each transaction completes. This is where the multiplexing happens — 200 clients can share 10 real connections. Most apps work fine here.
  • Statement mode: Connection returned after each statement. Breaks anything using explicit transactions or prepared statements. Rarely useful.

The catch with transaction mode: server-side prepared statements break because PgBouncer doesn’t guarantee the same Postgres backend on each request. If your ORM uses prepared statements (SQLAlchemy, Active Record, Prisma in certain modes), you’ll need to disable them. Most ORMs have a “disable prepared statements” option specifically for PgBouncer compatibility — check your driver docs.

Serverless Functions Are a Connection Pool Nightmare

AWS Lambda, Vercel Functions, Cloudflare Workers — stateless by design. Each invocation is isolated. There’s no shared pool across invocations. Each cold start opens a new connection; each warm instance holds one open indefinitely.

At scale: 100 Lambda functions fire simultaneously. Each one opens a Postgres connection. You’re at max_connections=100 instantly. And since Lambda keeps warm instances alive between requests, those connections stay open even when nothing is running. You’re using 100 connections to handle what might be 5 concurrent queries.

The standard solutions, in order of complexity:

  • PgBouncer on a separate server: Lambda functions connect to PgBouncer, not Postgres directly. Full control, works with any Postgres. Adds one more thing to operate.
  • RDS Proxy (AWS): Managed PgBouncer-style pooling for RDS and Aurora. Handles the serverless connection surge without infrastructure management, but adds ~1–2ms latency and costs per database endpoint per hour.
  • Managed databases with built-in pooling: Neon, Supabase, and PlanetScale all include connection pooling. Neon’s serverless driver uses HTTP instead of TCP for queries — no persistent connection at all, which sidesteps the problem entirely for simple workloads.

If you’re deploying to Lambda and connecting directly to Postgres without a pool in front: you will hit this. Not might. Will.

Diagnosing Saturation Before It Takes Down Your App

Postgres exposes real-time connection state via pg_stat_activity. If you suspect pool issues, run this:

SELECT
  state,
  count(*) AS connections,
  max(now() - state_change) AS longest_in_state
FROM pg_stat_activity
WHERE datname = 'your_database'
GROUP BY state;

If you see a high count of idle in transaction rows, you have connections that opened a transaction and never committed or rolled back. Those connections are stuck — they’re holding pool slots and blocking everything behind them. The fix is to find the code path that starts a transaction without a proper try/finally or context manager, and to set idle_in_transaction_session_timeout in Postgres to automatically kill them:

-- Kill connections idle in an open transaction for more than 30 seconds
ALTER DATABASE your_database SET idle_in_transaction_session_timeout = '30s';

To see what’s actively waiting or blocked right now:

SELECT pid, query, state, wait_event_type, wait_event
FROM pg_stat_activity
WHERE wait_event IS NOT NULL
  AND datname = 'your_database';

When you’re debugging the queries themselves — especially the gnarly multi-join ones that tend to cause slow-query incidents — an SQL formatter helps make the minified ORM output readable. Staring at a single-line 400-character query from pg_stat_activity is not a fun way to diagnose an incident.

What Pool Exhaustion Does to Your SLA

If your pool saturates for 30 seconds during a traffic spike and users see errors, that’s 30 seconds of downtime. If it happens twice a month, that’s 1 minute of downtime per month — which sounds fine until you work out what it means in percentage terms and realize it’s happening during peak traffic, not at 3am.

The shape of downtime matters as much as the total. An uptime SLA calculator lets you convert minutes of downtime into percentage terms and see how connection incidents are eating into your availability budget — useful when you’re making the case for infrastructure investment to fix the pooling setup.

The Settings That Actually Matter

Quick reference for the Postgres side — these are the knobs that interact with your pool:

ContextoPadrãoRecomendação
max_connections100Keep conservative. More connections = more memory. Let your pool do the work, not this number.
idle_in_transaction_session_timeout0 (off)Set to 30s–60s. Kills leaked transactions before they hoard pool slots indefinitely.
statement_timeout0 (off)Set a reasonable limit (e.g., 30s). Runaway queries shouldn’t hold connections forever.
work_mem4 MBMultiplied by active connections per query operation. 100 connections × 4 MB = 400 MB just for sort/hash buffers.

Most “app slow under load” incidents trace back to one of: pool too small for the actual query latency, pool too large and competing with Postgres’s own scheduling, missing idle_in_transaction_session_timeout letting leaked transactions accumulate, or serverless functions bypassing pooling entirely.

The fix is never “increase max_connections.” Postgres doesn’t scale that way — you’ll just hit the wall later. The fix is a correctly-sized pool between your application and the database, with timeouts set so leaks can’t accumulate. That’s it.

Quer eliminar anúncios? Fique sem anúncios hoje mesmo

Instale nossas extensões

Adicione ferramentas de IO ao seu navegador favorito para acesso instantâneo e pesquisa mais rápida

Ao Extensão do Chrome Ao Extensão de Borda Ao Extensão Firefox Ao Extensão Opera

O placar chegou!

Placar é uma forma divertida de acompanhar seus jogos, todos os dados são armazenados em seu navegador. Mais recursos serão lançados em breve!

ANUNCIADO Remover?
ANUNCIADO Remover?
ANUNCIADO Remover?

Notícias com destaques técnicos

Envolver-se

Ajude-nos a continuar fornecendo ferramentas gratuitas valiosas

Compre-me um café
ANUNCIADO Remover?