SQL query optimization patterns including EXPLAIN plan analysis, index strategies, query rewriting, and N+1 query prevention. Use when optimizing slow database queries, analyzing query performance, designing indexes, or debugging database bottlenecks. Works with PostgreSQL, MySQL, SQLite, and other SQL databases. Typical improvements: 10x-1000x query speedup.
Add this skill
npx mdskills install applied-artificial-intelligence/sql-optimizationComprehensive reference guide with excellent EXPLAIN analysis, index strategies, and concrete optimization patterns across databases
1---2name: sql-optimization3description: SQL query optimization patterns including EXPLAIN plan analysis, index strategies, query rewriting, and N+1 query prevention. Use when optimizing slow database queries, analyzing query performance, designing indexes, or debugging database bottlenecks. Works with PostgreSQL, MySQL, SQLite, and other SQL databases. Typical improvements: 10x-1000x query speedup.4---56# SQL Query Optimization Patterns78Comprehensive guide to optimizing SQL queries for performance, including EXPLAIN plan analysis, index design strategies, query rewriting patterns, and N+1 query detection. Works across PostgreSQL, MySQL, and SQLite.910---1112## Quick Reference1314**When to use this skill:**15- Slow database queries (>100ms for simple queries, >1s for complex)16- High database CPU usage17- Analyzing EXPLAIN plans18- Designing database indexes19- Debugging N+1 query problems20- Optimizing JOIN operations21- Reducing table scans2223**Common triggers:**24- "This query is too slow"25- "How do I optimize this SQL"26- "What indexes should I add"27- "Explain this EXPLAIN plan"28- "Fix N+1 queries"29- "Database CPU at 100%"3031**Typical improvements:**32- 3 seconds → 50ms (60x faster)33- Full table scan → Index scan34- 1000 queries → 2 queries (N+1 elimination)3536---3738## Part 1: Understanding EXPLAIN Plans3940### Reading PostgreSQL EXPLAIN4142```sql43EXPLAIN ANALYZE44SELECT u.name, p.title45FROM users u46JOIN posts p ON u.id = p.user_id47WHERE u.created_at > '2024-01-01'48ORDER BY p.created_at DESC49LIMIT 10;50```5152**Key Metrics to Watch**:53- **Seq Scan** (bad): Full table scan, reads every row54- **Index Scan** (good): Uses index, reads only needed rows55- **Cost**: Estimated computational cost (lower is better)56- **Actual time**: Real execution time in milliseconds57- **Rows**: Number of rows processed at each step5859### Bad EXPLAIN Example6061```62Seq Scan on users u (cost=0.00..1234.00 rows=1000 width=50)63 (actual time=0.123..45.678 rows=950 loops=1)64 Filter: (created_at > '2024-01-01'::date)65 Rows Removed by Filter: 5000066Planning Time: 0.234 ms67Execution Time: 3456.789 ms68```6970**Problems**:71- **Seq Scan**: Reading entire table (50,950 rows)72- **Rows Removed by Filter**: Filtering after reading (wasteful)73- **Execution Time**: 3.5 seconds (way too slow)7475### Good EXPLAIN Example (After Index)7677```78Index Scan using users_created_at_idx on users u79 (cost=0.29..123.45 rows=950 width=50)80 (actual time=0.012..3.456 rows=950 loops=1)81 Index Cond: (created_at > '2024-01-01'::date)82Planning Time: 0.123 ms83Execution Time: 4.567 ms84```8586**Improvements**:87- **Index Scan**: Using index (only reads needed rows)88- **Index Cond**: Filtering during index scan (efficient)89- **Execution Time**: 4.5ms (750x faster!)9091---9293## Part 2: Index Design Strategies9495### When to Add an Index9697**✅ Add index when:**98- Column frequently in WHERE clauses99- Column frequently in JOIN conditions100- Column frequently in ORDER BY101- Query does Seq Scan on large table (>10K rows)102- Query execution time >100ms103104**❌ Don't add index when:**105- Table has <1000 rows (Seq Scan is fast enough)106- Column rarely queried107- Column has very low cardinality (e.g., boolean with 50/50 distribution)108- Table has heavy writes (indexes slow down INSERTs/UPDATEs)109110### Single-Column Indexes111112```sql113-- Create index on frequently queried column114CREATE INDEX idx_users_email ON users(email);115CREATE INDEX idx_posts_user_id ON posts(user_id);116CREATE INDEX idx_orders_created_at ON orders(created_at);117```118119**Query that benefits**:120```sql121-- Before index: Seq Scan (3000ms)122-- After index: Index Scan (5ms)123SELECT * FROM users WHERE email = 'user@example.com';124```125126### Composite Indexes (Multi-Column)127128**Order matters!** Put most selective column first.129130```sql131-- Good: Specific to general132CREATE INDEX idx_orders_user_status ON orders(user_id, status, created_at);133134-- Query that uses all columns135SELECT * FROM orders136WHERE user_id = 123137 AND status = 'pending'138ORDER BY created_at DESC;139```140141**Index usage rules**:142```sql143-- ✅ Uses index (matches from left)144WHERE user_id = 123145WHERE user_id = 123 AND status = 'pending'146WHERE user_id = 123 AND status = 'pending' AND created_at > '2024-01-01'147148-- ❌ Doesn't use index (skips first column)149WHERE status = 'pending'150WHERE created_at > '2024-01-01'151```152153### Partial Indexes (Filtered)154155**Index only subset of rows** (smaller, faster):156157```sql158-- Only index active users159CREATE INDEX idx_users_active_email160ON users(email)161WHERE status = 'active';162163-- Query that uses partial index164SELECT * FROM users165WHERE email = 'user@example.com'166 AND status = 'active'; -- Must include filter condition!167```168169**Benefits**:170- Smaller index (faster scans, less storage)171- Only indexes rows you actually query172- Great for status fields with skewed distribution173174### Covering Indexes (INCLUDE)175176**Include columns in index** to avoid table access:177178```sql179-- Postgres: INCLUDE clause180CREATE INDEX idx_users_email_covering181ON users(email)182INCLUDE (name, created_at);183184-- Query doesn't need to access table (all data in index)185SELECT name, created_at FROM users WHERE email = 'user@example.com';186```187188**Benefit**: Index-Only Scan (even faster than Index Scan)189190---191192## Part 3: Query Rewriting Patterns193194### Problem 1: SELECT * (Fetching Unnecessary Data)195196**❌ Bad** (fetches all columns):197```sql198SELECT * FROM users WHERE id = 123;199```200201**✅ Good** (fetches only needed columns):202```sql203SELECT id, name, email FROM users WHERE id = 123;204```205206**Impact**:207- Smaller result set208- Less memory usage209- Faster network transfer210- Can use covering indexes211212### Problem 2: N+1 Queries213214**❌ Bad** (1 query for posts + N queries for users):215```python216# 1 query217posts = db.execute("SELECT * FROM posts LIMIT 10")218219# N queries (10 separate queries!)220for post in posts:221 user = db.execute("SELECT * FROM users WHERE id = ?", post.user_id)222 print(f"{user.name}: {post.title}")223```224225**✅ Good** (2 queries total):226```python227# 1 query for posts228posts = db.execute("SELECT * FROM posts LIMIT 10")229230# 1 query for all users231user_ids = [p.user_id for p in posts]232users = db.execute("SELECT * FROM users WHERE id IN (?)", user_ids)233users_by_id = {u.id: u for u in users}234235# No additional queries236for post in posts:237 user = users_by_id[post.user_id]238 print(f"{user.name}: {post.title}")239```240241**Even Better** (1 query with JOIN):242```sql243SELECT u.name, p.title244FROM posts p245JOIN users u ON p.user_id = u.id246LIMIT 10;247```248249### Problem 3: Implicit Type Conversion250251**❌ Bad** (index not used due to implicit conversion):252```sql253-- user_id is INTEGER, but '123' is string254SELECT * FROM posts WHERE user_id = '123'; -- Seq Scan!255```256257**✅ Good** (explicit type, index used):258```sql259SELECT * FROM posts WHERE user_id = 123; -- Index Scan260```261262### Problem 4: Function Calls in WHERE Clause263264**❌ Bad** (can't use index):265```sql266-- Function prevents index usage267SELECT * FROM users WHERE LOWER(email) = 'user@example.com';268```269270**✅ Good** (use functional index):271```sql272-- Create index on function result273CREATE INDEX idx_users_lower_email ON users(LOWER(email));274275-- Now query can use index276SELECT * FROM users WHERE LOWER(email) = 'user@example.com';277```278279**✅ Even Better** (store lowercase, regular index):280```sql281-- Store email in lowercase282CREATE INDEX idx_users_email ON users(email);283284-- Query with lowercase value285SELECT * FROM users WHERE email = LOWER('user@example.com');286```287288### Problem 5: OR Conditions289290**❌ Bad** (often does Seq Scan):291```sql292SELECT * FROM users293WHERE email = 'user@example.com'294 OR username = 'johndoe';295```296297**✅ Good** (use UNION, allows index usage):298```sql299SELECT * FROM users WHERE email = 'user@example.com'300UNION301SELECT * FROM users WHERE username = 'johndoe';302```303304### Problem 6: NOT IN with Subquery305306**❌ Bad** (very slow, especially with large subquery):307```sql308SELECT * FROM users309WHERE id NOT IN (SELECT user_id FROM blocked_users);310```311312**✅ Good** (use LEFT JOIN with NULL check):313```sql314SELECT u.*315FROM users u316LEFT JOIN blocked_users b ON u.id = b.user_id317WHERE b.user_id IS NULL;318```319320---321322## Part 4: JOIN Optimization323324### JOIN Order Matters325326Database optimizes join order, but you can help:327328**Small table first, large table second**:329```sql330-- Good: Join small table (categories: 10 rows) to large table (products: 10M rows)331SELECT p.*332FROM categories c333JOIN products p ON c.id = p.category_id334WHERE c.name = 'Electronics';335```336337### Avoid Cartesian Products338339**❌ Bad** (produces rows1 × rows2 × rows3):340```sql341SELECT *342FROM users, posts, comments; -- 100 × 10000 × 50000 = 50 billion rows!343```344345**✅ Good** (proper JOIN conditions):346```sql347SELECT *348FROM users u349JOIN posts p ON u.id = p.user_id350JOIN comments c ON p.id = c.post_id;351```352353### EXISTS vs IN for Subqueries354355**Scenario**: Find users who have at least one post356357**❌ Slower** (IN loads all post IDs):358```sql359SELECT * FROM users360WHERE id IN (SELECT user_id FROM posts);361```362363**✅ Faster** (EXISTS stops at first match):364```sql365SELECT * FROM users u366WHERE EXISTS (SELECT 1 FROM posts p WHERE p.user_id = u.id);367```368369### JOIN vs Subquery370371**Use JOIN when** you need data from both tables:372```sql373SELECT u.name, p.title374FROM users u375JOIN posts p ON u.id = p.user_id;376```377378**Use Subquery when** you only need filtering:379```sql380SELECT * FROM users381WHERE id IN (SELECT DISTINCT user_id FROM posts);382```383384---385386## Part 5: Aggregation Optimization387388### COUNT(*) Optimization389390**Problem**: COUNT(*) on large tables is slow391392**❌ Slow** (counts all rows):393```sql394SELECT COUNT(*) FROM orders; -- Full table scan on 10M rows395```396397**✅ Faster** (approximate count from statistics):398```sql399-- PostgreSQL: Use statistics (fast but approximate)400SELECT reltuples::bigint AS estimate401FROM pg_class402WHERE relname = 'orders';403```404405**✅ Also Good** (exact count with index):406```sql407-- If you have an index, COUNT can use index-only scan408CREATE INDEX idx_orders_id ON orders(id);409SELECT COUNT(*) FROM orders; -- Index-only scan410```411412### GROUP BY Optimization413414**❌ Bad** (groups after fetching all data):415```sql416SELECT user_id, COUNT(*)417FROM posts418GROUP BY user_id;419```420421**✅ Good** (index helps grouping):422```sql423-- Create index on grouped column424CREATE INDEX idx_posts_user_id ON posts(user_id);425426SELECT user_id, COUNT(*)427FROM posts428GROUP BY user_id; -- Can use index for grouping429```430431### HAVING vs WHERE432433**❌ Bad** (filters after grouping):434```sql435SELECT user_id, COUNT(*) as post_count436FROM posts437GROUP BY user_id438HAVING user_id > 1000; -- Filters after grouping all users439```440441**✅ Good** (filters before grouping):442```sql443SELECT user_id, COUNT(*) as post_count444FROM posts445WHERE user_id > 1000 -- Filters before grouping (can use index)446GROUP BY user_id;447```448449---450451## Part 6: Pagination Optimization452453### OFFSET is Slow for Large Offsets454455**❌ Bad** (OFFSET scans all skipped rows):456```sql457-- Page 1000 (skips 50,000 rows, then returns 50)458SELECT * FROM posts459ORDER BY created_at DESC460LIMIT 50 OFFSET 50000; -- Scans 50,050 rows!461```462463**✅ Good** (keyset pagination with WHERE):464```sql465-- Save last seen ID from previous page466SELECT * FROM posts467WHERE created_at < '2024-01-15 10:30:00' -- Last seen timestamp468ORDER BY created_at DESC469LIMIT 50; -- Only fetches 50 rows470```471472**Implementation**:473```python474# Page 1475results = db.execute("""476 SELECT id, created_at, title477 FROM posts478 ORDER BY created_at DESC, id DESC479 LIMIT 50480""")481last_created_at = results[-1].created_at482last_id = results[-1].id483484# Page 2 (using last seen values)485results = db.execute("""486 SELECT id, created_at, title487 FROM posts488 WHERE (created_at, id) < (%s, %s)489 ORDER BY created_at DESC, id DESC490 LIMIT 50491""", last_created_at, last_id)492```493494**Benefits**:495- Consistent performance for any page number496- No OFFSET overhead497- Works with millions of rows498499---500501## Part 7: Database-Specific Patterns502503### PostgreSQL-Specific504505**1. EXPLAIN ANALYZE** (shows actual execution):506```sql507EXPLAIN (ANALYZE, BUFFERS)508SELECT * FROM users WHERE email = 'user@example.com';509```510511**2. pg_stat_statements** (find slow queries):512```sql513-- Enable extension514CREATE EXTENSION pg_stat_statements;515516-- Find slowest queries517SELECT518 calls,519 mean_exec_time,520 total_exec_time,521 query522FROM pg_stat_statements523ORDER BY mean_exec_time DESC524LIMIT 10;525```526527**3. VACUUM and ANALYZE**:528```sql529-- Update statistics after bulk changes530VACUUM ANALYZE table_name;531```532533### MySQL-Specific534535**1. EXPLAIN FORMAT=JSON** (detailed output):536```sql537EXPLAIN FORMAT=JSON538SELECT * FROM users WHERE email = 'user@example.com';539```540541**2. SHOW PROFILE** (detailed timing):542```sql543SET profiling = 1;544SELECT * FROM users WHERE email = 'user@example.com';545SHOW PROFILE FOR QUERY 1;546```547548**3. Query Cache** (deprecated in MySQL 8.0):549```sql550-- Old MySQL versions only551SHOW VARIABLES LIKE 'query_cache%';552```553554### SQLite-Specific555556**1. EXPLAIN QUERY PLAN**:557```sql558EXPLAIN QUERY PLAN559SELECT * FROM users WHERE email = 'user@example.com';560```561562**2. ANALYZE** (update statistics):563```sql564ANALYZE table_name;565```566567**3. Indexes are critical** (SQLite has no query optimizer):568```sql569-- SQLite relies heavily on indexes570CREATE INDEX idx_users_email ON users(email);571```572573---574575## Part 8: Monitoring and Diagnosis576577### Find Missing Indexes (PostgreSQL)578579```sql580-- Queries doing sequential scans on large tables581SELECT582 schemaname,583 tablename,584 seq_scan,585 seq_tup_read,586 idx_scan,587 seq_tup_read / seq_scan AS avg_seq_read588FROM pg_stat_user_tables589WHERE seq_scan > 0590ORDER BY seq_tup_read DESC591LIMIT 25;592```593594### Find Unused Indexes (PostgreSQL)595596```sql597-- Indexes that are never used598SELECT599 schemaname,600 tablename,601 indexname,602 idx_scan,603 pg_size_pretty(pg_relation_size(indexrelid)) AS size604FROM pg_stat_user_indexes605WHERE idx_scan = 0606 AND indexrelname NOT LIKE '%_pkey' -- Exclude primary keys607ORDER BY pg_relation_size(indexrelid) DESC;608```609610### Check Index Bloat611612```sql613-- Postgres: Check for bloated indexes614SELECT615 schemaname,616 tablename,617 indexname,618 pg_size_pretty(pg_relation_size(indexrelid)) AS size619FROM pg_stat_user_indexes620ORDER BY pg_relation_size(indexrelid) DESC;621622-- Rebuild bloated indexes623REINDEX INDEX index_name;624```625626---627628## Part 9: Common Patterns by Use Case629630### Pattern 1: Recent Records with Pagination631632```sql633-- Create composite index634CREATE INDEX idx_posts_created_user ON posts(created_at DESC, user_id);635636-- Query uses index efficiently637SELECT * FROM posts638WHERE user_id = 123639ORDER BY created_at DESC640LIMIT 20;641```642643### Pattern 2: Search Across Multiple Columns644645```sql646-- Create GIN index for full-text search (Postgres)647CREATE INDEX idx_posts_search ON posts648USING gin(to_tsvector('english', title || ' ' || content));649650-- Fast full-text search651SELECT * FROM posts652WHERE to_tsvector('english', title || ' ' || content) @@653 to_tsquery('english', 'database & optimization');654```655656### Pattern 3: Range Queries657658```sql659-- Create index on range column660CREATE INDEX idx_orders_created_at ON orders(created_at);661662-- Query uses index663SELECT * FROM orders664WHERE created_at BETWEEN '2024-01-01' AND '2024-01-31'665ORDER BY created_at;666```667668### Pattern 4: Distinct Values with Filter669670```sql671-- Create composite index672CREATE INDEX idx_posts_category_user ON posts(category_id, user_id);673674-- Query uses index675SELECT DISTINCT category_id676FROM posts677WHERE user_id = 123;678```679680---681682## Part 10: Quick Optimization Checklist683684**Diagnosis**:685- [ ] Run EXPLAIN ANALYZE on slow queries686- [ ] Check for Seq Scan on tables >10K rows687- [ ] Look for missing indexes688- [ ] Identify N+1 query patterns689- [ ] Check query execution time (aim for <100ms)690691**Index Strategy**:692- [ ] Index columns in WHERE clauses693- [ ] Index columns in JOIN conditions694- [ ] Index columns in ORDER BY695- [ ] Use composite indexes for multi-column queries696- [ ] Consider partial indexes for filtered queries697- [ ] Remove unused indexes698699**Query Rewriting**:700- [ ] Select only needed columns (not SELECT *)701- [ ] Use JOIN instead of N+1 queries702- [ ] Use proper types (avoid implicit conversion)703- [ ] Avoid functions in WHERE (use functional indexes)704- [ ] Use UNION instead of OR for multiple conditions705- [ ] Use EXISTS instead of IN for large subqueries706707**Pagination**:708- [ ] Use keyset pagination (not OFFSET) for large datasets709- [ ] Include stable sort column (id) in pagination710- [ ] Index pagination columns711712**Monitoring**:713- [ ] Enable query logging for slow queries714- [ ] Monitor index usage715- [ ] Check for index bloat716- [ ] Regularly VACUUM ANALYZE (Postgres)717718---719720## Resources721722**PostgreSQL**:723- EXPLAIN docs: https://www.postgresql.org/docs/current/sql-explain.html724- Index types: https://www.postgresql.org/docs/current/indexes-types.html725- Performance tips: https://wiki.postgresql.org/wiki/Performance_Optimization726727**MySQL**:728- EXPLAIN output: https://dev.mysql.com/doc/refman/8.0/en/explain-output.html729- Optimization: https://dev.mysql.com/doc/refman/8.0/en/optimization.html730731**Tools**:732- pgAdmin (Postgres GUI)733- MySQL Workbench734- DBeaver (multi-database)735- explain.depesz.com (Postgres EXPLAIN visualizer)736
Full transparency — inspect the skill content before installing.