SQL Indexes — Why Your Query Is Slow and What to Actually Do About It
Your query runs fine on localhost. On production with millions of rows, it locks up for seconds. Here's how to read EXPLAIN output, add the right indexes, understand composite index column order, and know when an index won't help.
Your query takes 2ms on localhost with 500 rows. On production with 3 million rows, it takes 9 seconds and your users are staring at a spinner. The query itself is probably fine — the schema isn’t. You need an index, and you need to understand 为什么 so you stop guessing the next time this happens.
What the database is doing without an index
A B-tree index is a sorted copy of a column (or columns), stored in a balanced tree the database can binary-search in O(log n) time. Without one, the database does a full table scan — it reads every single row to find matches. With 3 million rows, that’s 3 million row reads. With an index on the right column, it’s maybe 20–30 tree node traversals to find the matching rows.
Think of it like a phone book. Finding “Smith, John” in alphabetical order takes seconds. Finding all people named John by reading every entry from page 1 takes forever. An index is the alphabetical sorting — the database uses it to skip directly to the relevant rows instead of scanning everything.
One important thing: indexes exist on disk alongside your table. The database maintains them automatically on writes. That’s free for reads but costs something on every INSERT, UPDATE, and DELETE — which is why you can’t just index everything.
Spotting the problem with EXPLAIN
Before touching anything, run EXPLAIN on the slow query. Here’s what you’re looking for in MySQL:
EXPLAIN SELECT * FROM orders WHERE customer_id = 12345;
+----+-------------+--------+------+---------------+------+---------+------+---------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+---------------+------+---------+------+---------+-------------+
| 1 | SIMPLE | orders | ALL | NULL | NULL | NULL | NULL | 2847391 | Using where |
+----+-------------+--------+------+---------------+------+---------+------+---------+-------------+
Two fields that matter:
- type: ALL — full table scan. Almost always bad once your table has more than ~100k rows.
- rows: 2847391 — the optimizer estimates it’ll examine 2.8 million rows to return your results.
Now add the index and run it again:
CREATE INDEX idx_orders_customer_id ON orders (customer_id);
EXPLAIN SELECT * FROM orders WHERE customer_id = 12345;
+----+-------------+--------+------+------------------------+------------------------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+------+------------------------+------------------------+---------+-------+------+-------+
| 1 | SIMPLE | orders | ref | idx_orders_customer_id | idx_orders_customer_id | 4 | const | 12 | |
+----+-------------+--------+------+------------------------+------------------------+---------+-------+------+-------+
type: ref 且 rows: 12. That’s the fix. 2.8 million rows examined → 12.
On PostgreSQL, use EXPLAIN ANALYZE (which actually executes the query) to get real timing numbers:
-- Before index
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 12345;
Seq Scan on orders (cost=0.00..89432.00 rows=14 width=156)
(actual time=142.341..8934.221 rows=12 loops=1)
Filter: (customer_id = 12345)
Rows Removed by Filter: 2847379
Planning Time: 0.4 ms
Execution Time: 8934.8 ms
-- After: CREATE INDEX idx_orders_customer_id ON orders (customer_id);
EXPLAIN ANALYZE SELECT * FROM orders WHERE customer_id = 12345;
Index Scan using idx_orders_customer_id on orders
(cost=0.43..92.15 rows=12 width=156)
(actual time=0.042..0.198 rows=12 loops=1)
Index Cond: (customer_id = 12345)
Planning Time: 0.5 ms
Execution Time: 0.3 ms
8934ms → 0.3ms. That’s not a cherry-picked example — this is the realistic impact of a single missing index on a table with a few million rows.
Composite indexes and the leftmost prefix rule
Once you have multiple columns in your WHERE clause, you need composite indexes — and the column order matters.
SELECT * FROM orders
WHERE customer_id = 12345
AND status = 'pending'
AND created_at > '2024-01-01';
这 leftmost prefix rule: the database uses a composite index starting from the leftmost column and works right, but stops being useful once it hits a gap. Given an index on (a, b, c):
WHERE a = 1— uses the indexWHERE a = 1 AND b = 2— uses both columns of the indexWHERE a = 1 AND b = 2 AND c = 3— uses the full indexWHERE b = 2— does NOT use the index (skipped column a)WHERE a = 1 AND c = 3— uses only the a part, ignores c
The practical ordering rule: equality conditions first, range conditions last. Range predicates (>, <, BETWEEN, LIKE 'prefix%') break the usable prefix for subsequent columns. So:
-- Good: equality columns first, range column last
CREATE INDEX idx_orders_compound ON orders (customer_id, status, created_at);
-- This query uses all three columns of the index:
SELECT * FROM orders
WHERE customer_id = 12345 -- equality, goes first
AND status = 'pending' -- equality, goes second
AND created_at > '2024-01-01'; -- range, goes last
-- Bad ordering for this query: range column in the middle
-- CREATE INDEX idx_bad ON orders (customer_id, created_at, status);
-- The status filter won't use the index because created_at is a range predicate
Also think about selectivity: put the most selective column (fewest duplicate values) earlier in the index. If customer_id has 50,000 distinct values and status has 3, customer_id should come first — it narrows the result set faster.
When indexes make things worse
Indexes are not free. Every index costs write performance and disk space. The database updates every index on every INSERT, UPDATE, and DELETE on that table. On write-heavy tables, excess indexes genuinely hurts throughput. Here’s where indexes fail or backfire:
- Low-cardinality columns. An index on a boolean or a
statuscolumn with 3 distinct values is often ignored by the optimizer. If a query would return more than ~10–15% of rows via the index, the database usually prefers a table scan — the index lookup overhead isn’t worth it. Check with EXPLAIN whether the index is actually being used. - Write-heavy tables. Log tables, event tables, audit tables — be conservative. A table with 10 indexes and millions of writes per hour is paying 10x write overhead on those indexes.
- Unused indexes. In PostgreSQL,
pg_stat_user_indexesshowsidx_scan— how many times each index has been used since the last stats reset. A zero-scan index is pure overhead. MySQL tracks this via Performance Schema (sys.schema_unused_indexesin MySQL 5.7+). - The optimizer guessing wrong. Query planners aren’t always right. Sometimes they choose an index that’s slower than a full scan.
EXPLAIN ANALYZEwill catch this — compare estimated vs actual rows. In MySQL you can hint withUSE INDEX (idx_name); in PostgreSQL you can test withSET enable_seqscan = off(don’t leave this in production).
Index or rewrite? The real tradeoff
Sometimes the index is the fix. Sometimes the query is the problem and the index just masks it. A few patterns where you should rewrite the query instead of (or in addition to) adding an index:
Functions on indexed columns
-- Will NOT use an index on created_at — the function result isn't indexed:
SELECT * FROM orders WHERE YEAR(created_at) = 2024;
-- Rewrite to use the index:
SELECT * FROM orders
WHERE created_at >= '2024-01-01'
AND created_at < '2025-01-01';
Wrapping a column in any function prevents standard index usage. The index stores values of created_at, not values of YEAR(created_at) — those are different things. PostgreSQL supports expression indexes (CREATE INDEX ON orders (YEAR(created_at))) as a workaround, but the query rewrite is cleaner and works everywhere.
Covering indexes for hot read paths
A covering index includes all columns a query needs, so the database never has to read the actual table rows at all — it answers entirely from the index. This can be significantly faster than a regular index for heavily-read tables:
-- Query that only needs id, status, total:
SELECT id, status, total FROM orders WHERE customer_id = 12345;
-- Regular index: database finds matching rows via index, then reads each row from table
CREATE INDEX idx_orders_customer ON orders (customer_id);
-- Covering index: database reads everything it needs from the index alone
CREATE INDEX idx_orders_cover ON orders (customer_id, id, status, total);
-- Note: customer_id first (the WHERE column), then the SELECT columns
LIKE patterns
-- Uses the index (prefix match — database knows where to start):
SELECT * FROM users WHERE email LIKE 'john%';
-- Does NOT use the index (leading wildcard — database doesn't know where to start):
SELECT * FROM users WHERE email LIKE '%@gmail.com';
Leading wildcards are not fixable with B-tree indexes. If you need suffix or substring search, you need a full-text index (FULLTEXT in MySQL, GIN 是快速路径,但它对属性处理不一致,且在边缘情况中可能丢失数据。在处理SOAP响应的生产环境中,建议使用 pg_trgm in PostgreSQL) or an external search system like Elasticsearch.
N+1 queries
No index fixes an N+1 query. If your ORM is doing one query to fetch 200 orders, then a separate query per order to get the customer name, you have 201 queries. Each one might be fast. The problem is the volume. Fix it with a JOIN or a batched query — no amount of indexing changes the query count.
One more thing: clean up the query before you analyze it
Before running EXPLAIN on a slow query, it helps to have the SQL formatted clearly. A 200-character one-liner with no indentation is harder to reason about than a structured query. The SQL Formatter at IO Tools handles MySQL, PostgreSQL, SQL Server, and other dialects — useful when you're pasting a query from logs or an ORM debug output and need to see its structure at a glance.
The practical checklist
- Run
EXPLAINbefore adding any index. Confirm the problem first. - Look for
type: ALL(MySQL) orSeq Scan(PostgreSQL) — usually where the problem is. - Check the
rowsestimate in EXPLAIN. Millions of rows scanned to return dozens = missing index. - In composite indexes: equality conditions first, range conditions last.
- Higher selectivity columns go earlier in a composite index.
- Don't index low-cardinality columns without verifying the optimizer actually uses the index.
- Audit unused indexes regularly:
pg_stat_user_indexes(PostgreSQL),sys.schema_unused_indexes(MySQL). - If a function wraps your column in WHERE, rewrite the query instead of indexing the function.
- For read-heavy hot paths with a small column set, consider covering indexes.
- N+1 is a query architecture problem, not an index problem.
