Anúncios incomodam? Ir Sem anúncios Hoje

SQL Indexes — Why Your Query Is Slow and What to Actually Do About It

Atualizado em

Your query runs fine on localhost. On production with millions of rows, it locks up for seconds. Here's how to read EXPLAIN output, add the right indexes, understand composite index column order, and know when an index won't help.

SQL Indexes — Why Your Query Is Slow and What to Actually Do About It 1
ANUNCIADO Remover?

Your query takes 2ms on localhost with 500 rows. On production with 3 million rows, it takes 9 seconds and your users are staring at a spinner. The query itself is probably fine — the schema isn’t. You need an index, and you need to understand porquê so you stop guessing the next time this happens.

What the database is doing without an index

A B-tree index is a sorted copy of a column (or columns), stored in a balanced tree the database can binary-search in O(log n) time. Without one, the database does a full table scan — it reads every single row to find matches. With 3 million rows, that’s 3 million row reads. With an index on the right column, it’s maybe 20–30 tree node traversals to find the matching rows.

Think of it like a phone book. Finding “Smith, John” in alphabetical order takes seconds. Finding all people named John by reading every entry from page 1 takes forever. An index is the alphabetical sorting — the database uses it to skip directly to the relevant rows instead of scanning everything.

One important thing: indexes exist on disk alongside your table. The database maintains them automatically on writes. That’s free for reads but costs something on every INSERT, UPDATE, and DELETE — which is why you can’t just index everything.

Spotting the problem with EXPLAIN

Before touching anything, run EXPLAIN on the slow query. Here’s what you’re looking for in MySQL:

EXPLAIN SELECT * FROM orders WHERE customer_id = 12345;

+----+-------------+--------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table  | type | possible_keys | key  | key_len | ref  | rows    | Extra       |
+----+-------------+--------+------+---------------+------+---------+------+---------+-------------+
|  1 | SIMPLE      | orders | ALL  | NULL          | NULL | NULL    | NULL | 2847391 | Using where |
+----+-------------+--------+------+---------------+------+---------+------+---------+-------------+

Two fields that matter:

  • type: ALL — full table scan. Almost always bad once your table has more than ~100k rows.
  • rows: 2847391 — the optimizer estimates it’ll examine 2.8 million rows to return your results.

Now add the index and run it again:

CREATE INDEX idx_orders_customer_id ON orders (customer_id);

EXPLAIN SELECT * FROM orders WHERE customer_id = 12345;

+----+-------------+--------+------+------------------------+------------------------+---------+-------+------+-------+
| id | select_type | table  | type | possible_keys          | key                    | key_len | ref   | rows | Extra |
+----+-------------+--------+------+------------------------+------------------------+---------+-------+------+-------+
|  1 | SIMPLE      | orders | ref  | idx_orders_customer_id | idx_orders_customer_id | 4       | const |   12 |       |
+----+-------------+--------+------+------------------------+------------------------+---------+-------+------+-------+

type: ref e rows: 12. That’s the fix. 2.8 million rows examined → 12.

On PostgreSQL, use EXPLAIN ANALYZE (which actually executes the query) to get real timing numbers:

-- Before index
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 12345;

Seq Scan on orders  (cost=0.00..89432.00 rows=14 width=156)
                    (actual time=142.341..8934.221 rows=12 loops=1)
  Filter: (customer_id = 12345)
  Rows Removed by Filter: 2847379
Planning Time: 0.4 ms
Execution Time: 8934.8 ms

-- After: CREATE INDEX idx_orders_customer_id ON orders (customer_id);
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 12345;

Index Scan using idx_orders_customer_id on orders
                    (cost=0.43..92.15 rows=12 width=156)
                    (actual time=0.042..0.198 rows=12 loops=1)
  Index Cond: (customer_id = 12345)
Planning Time: 0.5 ms
Execution Time: 0.3 ms

8934ms → 0.3ms. That’s not a cherry-picked example — this is the realistic impact of a single missing index on a table with a few million rows.

Composite indexes and the leftmost prefix rule

Once you have multiple columns in your WHERE clause, you need composite indexes — and the column order matters.

SELECT * FROM orders
WHERE customer_id = 12345
  AND status = 'pending'
  AND created_at > '2024-01-01';

O leftmost prefix rule: the database uses a composite index starting from the leftmost column and works right, but stops being useful once it hits a gap. Given an index on (a, b, c):

  • WHERE a = 1 — uses the index
  • WHERE a = 1 AND b = 2 — uses both columns of the index
  • WHERE a = 1 AND b = 2 AND c = 3 — uses the full index
  • WHERE b = 2 — does NOT use the index (skipped column a)
  • WHERE a = 1 AND c = 3 — uses only the a part, ignores c

The practical ordering rule: equality conditions first, range conditions last. Range predicates (>, <, BETWEEN, LIKE 'prefix%') break the usable prefix for subsequent columns. So:

-- Good: equality columns first, range column last
CREATE INDEX idx_orders_compound ON orders (customer_id, status, created_at);

-- This query uses all three columns of the index:
SELECT * FROM orders
WHERE customer_id = 12345   -- equality, goes first
  AND status = 'pending'    -- equality, goes second
  AND created_at > '2024-01-01'; -- range, goes last

-- Bad ordering for this query: range column in the middle
-- CREATE INDEX idx_bad ON orders (customer_id, created_at, status);
-- The status filter won't use the index because created_at is a range predicate

Also think about selectivity: put the most selective column (fewest duplicate values) earlier in the index. If customer_id has 50,000 distinct values and status has 3, customer_id should come first — it narrows the result set faster.

When indexes make things worse

Indexes are not free. Every index costs write performance and disk space. The database updates every index on every INSERT, UPDATE, and DELETE on that table. On write-heavy tables, excess indexes genuinely hurts throughput. Here’s where indexes fail or backfire:

  • Low-cardinality columns. An index on a boolean or a status column with 3 distinct values is often ignored by the optimizer. If a query would return more than ~10–15% of rows via the index, the database usually prefers a table scan — the index lookup overhead isn’t worth it. Check with EXPLAIN whether the index is actually being used.
  • Write-heavy tables. Log tables, event tables, audit tables — be conservative. A table with 10 indexes and millions of writes per hour is paying 10x write overhead on those indexes.
  • Unused indexes. In PostgreSQL, pg_stat_user_indexes shows idx_scan — how many times each index has been used since the last stats reset. A zero-scan index is pure overhead. MySQL tracks this via Performance Schema (sys.schema_unused_indexes in MySQL 5.7+).
  • The optimizer guessing wrong. Query planners aren’t always right. Sometimes they choose an index that’s slower than a full scan. EXPLAIN ANALYZE will catch this — compare estimated vs actual rows. In MySQL you can hint with USE INDEX (idx_name); in PostgreSQL you can test with SET enable_seqscan = off (don’t leave this in production).

Index or rewrite? The real tradeoff

Sometimes the index is the fix. Sometimes the query is the problem and the index just masks it. A few patterns where you should rewrite the query instead of (or in addition to) adding an index:

Functions on indexed columns

-- Will NOT use an index on created_at — the function result isn't indexed:
SELECT * FROM orders WHERE YEAR(created_at) = 2024;

-- Rewrite to use the index:
SELECT * FROM orders
WHERE created_at >= '2024-01-01'
  AND created_at < '2025-01-01';

Wrapping a column in any function prevents standard index usage. The index stores values of created_at, not values of YEAR(created_at) — those are different things. PostgreSQL supports expression indexes (CREATE INDEX ON orders (YEAR(created_at))) as a workaround, but the query rewrite is cleaner and works everywhere.

Covering indexes for hot read paths

A covering index includes all columns a query needs, so the database never has to read the actual table rows at all — it answers entirely from the index. This can be significantly faster than a regular index for heavily-read tables:

-- Query that only needs id, status, total:
SELECT id, status, total FROM orders WHERE customer_id = 12345;

-- Regular index: database finds matching rows via index, then reads each row from table
CREATE INDEX idx_orders_customer ON orders (customer_id);

-- Covering index: database reads everything it needs from the index alone
CREATE INDEX idx_orders_cover ON orders (customer_id, id, status, total);
-- Note: customer_id first (the WHERE column), then the SELECT columns

LIKE patterns

-- Uses the index (prefix match — database knows where to start):
SELECT * FROM users WHERE email LIKE 'john%';

-- Does NOT use the index (leading wildcard — database doesn't know where to start):
SELECT * FROM users WHERE email LIKE '%@gmail.com';

Leading wildcards are not fixable with B-tree indexes. If you need suffix or substring search, you need a full-text index (FULLTEXT in MySQL, GIN com pg_trgm in PostgreSQL) or an external search system like Elasticsearch.

N+1 queries

No index fixes an N+1 query. If your ORM is doing one query to fetch 200 orders, then a separate query per order to get the customer name, you have 201 queries. Each one might be fast. The problem is the volume. Fix it with a JOIN or a batched query — no amount of indexing changes the query count.

One more thing: clean up the query before you analyze it

Before running EXPLAIN on a slow query, it helps to have the SQL formatted clearly. A 200-character one-liner with no indentation is harder to reason about than a structured query. The SQL Formatter at IO Tools handles MySQL, PostgreSQL, SQL Server, and other dialects — useful when you're pasting a query from logs or an ORM debug output and need to see its structure at a glance.

The practical checklist

  • Run EXPLAIN before adding any index. Confirm the problem first.
  • Look for type: ALL (MySQL) or Seq Scan (PostgreSQL) — usually where the problem is.
  • Check the rows estimate in EXPLAIN. Millions of rows scanned to return dozens = missing index.
  • In composite indexes: equality conditions first, range conditions last.
  • Higher selectivity columns go earlier in a composite index.
  • Don't index low-cardinality columns without verifying the optimizer actually uses the index.
  • Audit unused indexes regularly: pg_stat_user_indexes (PostgreSQL), sys.schema_unused_indexes (MySQL).
  • If a function wraps your column in WHERE, rewrite the query instead of indexing the function.
  • For read-heavy hot paths with a small column set, consider covering indexes.
  • N+1 is a query architecture problem, not an index problem.
Quer eliminar anúncios? Fique sem anúncios hoje mesmo

Instale nossas extensões

Adicione ferramentas de IO ao seu navegador favorito para acesso instantâneo e pesquisa mais rápida

Ao Extensão do Chrome Ao Extensão de Borda Ao Extensão Firefox Ao Extensão Opera

O placar chegou!

Placar é uma forma divertida de acompanhar seus jogos, todos os dados são armazenados em seu navegador. Mais recursos serão lançados em breve!

ANUNCIADO Remover?
ANUNCIADO Remover?
ANUNCIADO Remover?

Notícias com destaques técnicos

Envolver-se

Ajude-nos a continuar fornecendo ferramentas gratuitas valiosas

Compre-me um café
ANUNCIADO Remover?