Your database has 50 indexes. And it's SLOWER than having 5. Here's what most engineers get wrong about indexing: We treat indexes like free performance boosts. But every index you add is a hidden contract: - Every INSERT now updates N+1 data structures - Every UPDATE potentially rewrites multiple B-trees - The query planner gets confused with too many choices - Your working set no longer fits in memory I learned this the hard way at scale. We had a table with 34 indexes. Reads were fast. Writes were dying. P99 latency on inserts hit 1.2 seconds. The fix? We dropped 28 indexes. But here's the part nobody talks about: We replaced them with 3 composite indexes that covered 94% of our query patterns. The trick was analyzing pg_stat_user_indexes. Most of our indexes had ZERO scans in 30 days. They were dead weight burning I/O on every write. Here's the framework I now use: 1. Audit index usage monthly (pg_stat_user_indexes) 2. Every index must justify its write amplification cost 3. Composite indexes > single-column indexes (almost always) 4. Covering indexes eliminate heap lookups entirely 5. Partial indexes for queries that filter on a constant The result after cleanup: • Write latency dropped 73% • Storage shrank by 40% • Read performance stayed identical The best performance optimization isn't adding something new. It's removing what shouldn't be there. 💬 What's the worst index bloat you've seen? #SystemDesign #DatabaseEngineering #SoftwareEngineering #PostgreSQL #Performance
How to Optimize Postgresql Database Performance
Explore top LinkedIn content from expert professionals.
Summary
Improving PostgreSQL database performance means making the database run faster and more smoothly by adjusting how data is stored, accessed, and processed. This involves tuning both the way you write SQL queries and the technical settings in the database, as well as organizing indexes, memory, and table layouts.
- Review index usage: Regularly check which indexes are actually helping queries and remove those that go unused or slow down data updates.
- Adjust configuration settings: Tailor memory and cache settings to fit your server hardware and workload, such as increasing shared buffers and tuning cache sizes for faster responses.
- Refine query structure: Write SQL queries that select only needed fields and replace complex subqueries with joins to reduce unnecessary work and speed up data retrieval.
-
-
What's the worst PostgreSQL configuration mistake you've seen in production? I'll start: shared_buffers = 128 MB on a 64 GB server. Default PostgreSQL configuration. Running in production. For two years. On a database serving 50,000 queries per second. The database was using 128 MB of its own cache and relying entirely on the operating system's page cache for everything else. It worked — PostgreSQL is remarkably resilient — but it was leaving enormous performance on the table. Changed shared_buffers to 16 GB (25% of RAM), adjusted effective_cache_size to 48 GB, and query response times dropped 40% overnight. Other configuration horrors I've seen: • 𝗺𝗮𝘅_𝗰𝗼𝗻𝗻𝗲𝗰𝘁𝗶𝗼𝗻𝘀 = 𝟭𝟬𝟬𝟬 with no connection pooler. Each connection using 10 MB of RAM. Server running out of memory under load. • 𝗿𝗮𝗻𝗱𝗼𝗺_𝗽𝗮𝗴𝗲_𝗰𝗼𝘀𝘁 = 𝟰.𝟬 on an NVMe SSD. The planner thought random reads were 4x more expensive than sequential reads, avoiding index scans that would have been faster. Should have been 1.1. • 𝘄𝗼𝗿𝗸_𝗺𝗲𝗺 = 𝟰 𝗠𝗕 on an analytics workload. Every complex query spilling to disk for sorts and hash joins. Changed to 256 MB and query times dropped from minutes to seconds. • 𝗮𝘂𝘁𝗼𝘃𝗮𝗰𝘂𝘂𝗺 = 𝗼𝗳𝗳. Yes, someone turned it off. The table bloat was spectacular. The PostgreSQL defaults are intentionally conservative. They're designed to run on a Raspberry Pi, not to perform well on your production server. Five settings — shared_buffers, effective_cache_size, work_mem, maintenance_work_mem, random_page_cost — capture 80% of the performance gains. What's yours? I know there are worse stories out there. #PostgreSQL #Database #Configuration #DevOps #DBA
-
With a background in data engineering and business analysis, I’ve consistently seen the immense impact of optimized SQL code on improving the performance and efficiency of database operations. It indirectly contributes to cost savings by reducing resource consumption. Here are some techniques that have proven invaluable in my experience: 1. Index Large Tables: Indexing tables with large datasets (>1,000,000 rows) greatly speeds up searches and enhances query performance. However, be cautious of over-indexing, as excessive indexes can degrade write operations. 2. Select Specific Fields: Choosing specific fields instead of using SELECT * reduces the amount of data transferred and processed, which improves speed and efficiency. 3. Replace Subqueries with Joins: Using joins instead of subqueries in the WHERE clause can improve performance. 4. Use UNION ALL Instead of UNION: UNION ALL is preferable over UNION because it does not involve the overhead of sorting and removing duplicates. 5. Optimize with WHERE Instead of HAVING: Filtering data with WHERE clauses before aggregation operations reduces the workload and speeds up query processing. 6. Utilize INNER JOIN Instead of WHERE for Joins: INNER JOINs help the query optimizer make better execution decisions than complex WHERE conditions. 7. Minimize Use of OR in Joins: Avoiding the OR operator in joins enhances performance by simplifying the conditions and potentially reducing the dataset earlier in the execution process. 8. Use Views: Creating views instead of results that can be accessed faster than recalculating the views each time they are needed. 9. Minimize the Number of Subqueries: Reducing the number of subqueries in your SQL statements can significantly enhance performance by decreasing the complexity of the query execution plan and reducing overhead. 10. Implement Partitioning: Partitioning large tables can improve query performance and manageability by logically dividing them into discrete segments. This allows SQL queries to process only the relevant portions of data. #SQL #DataOptimization #DatabaseManagement #PerformanceTuning #DataEngineering
-
🤔 Ever wonder how I made my Rails app 69% faster with one line? Yes, one line — and boom, performance gains without touching your business logic. 🖼 Imagine this: You're building a dashboard that pulls: → Open tickets → SLA violations → Public agent comments Everything looks great on the surface. But the load time? Not so great. Something’s off. So you run EXPLAIN ANALYZE. And there it is: Query cost: 8781.42 — to fetch ~150 tickets. Turns out the bottleneck was in the comments table. We were filtering on two things: → is_public = true → from_agent = true But there was no efficient index for that combo. So Postgres defaulted to a slow scan — until we added a composite index. 🤔 What’s a composite index? → It’s an index made on more than one column. 💁♀️ With Composite Partial Index: "add_index :comments, [:ticket_id, :created_at], where: "is_public IS TRUE AND from_agent IS TRUE", algorithm: :concurrently" Result? New query cost: 2742.29 That’s a 69% performance boost with zero changes to business logic! ✅ When Should You Use This? → You frequently filter on specific boolean columns (e.g., is_public, from_agent) → Your table is large, and scans are slow → You want your data to load super fast without slowing down when you save or update it. 🎉 Tada! You just learned a Rails performance superpower and it only took a minute. 🚀 Pro Tip: → Adding conditions with where: creates a partial index, meaning the database only stores relevant rows, saving space and time. 🤝 Over to you: → Have you ever made a small change that gave massive results? → Do you use EXPLAIN ANALYZE in your Rails debugging workflow? 💬 Feel free to share your thoughts or questions in the comment section. If you want to learn with me, feel free to follow Chaitali Khangar
-
𝗛𝗼𝘄 𝘁𝗼 𝗶𝗺𝗽𝗿𝗼𝘃𝗲 𝗱𝗮𝘁𝗮𝗯𝗮𝘀𝗲 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲? Here are the most important ways to improve your database performance: 𝟭. 𝗜𝗻𝗱𝗲𝘅𝗶𝗻𝗴 Add indexes to columns you frequently search, filter, or join. Think of indexes as the book's table of contents - they help the database find information without scanning every record. But remember: too many indexes slow down write operations. 💡 𝗕𝗼𝗻𝘂𝘀 𝘁𝗶𝗽: Regularly drop unused indexes. They waste space and slow down writing without providing any benefit. 𝟮. 𝗠𝗮𝘁𝗲𝗿𝗶𝗮𝗹𝗶𝘇𝗲𝗱 𝗩𝗶𝗲𝘄𝘀 Pre-compute and store complex query results. This saves processing time when users need the data again. Schedule regular refreshes to keep the data current. 𝟯. 𝗩𝗲𝗿𝘁𝗶𝗰𝗮𝗹 𝗦𝗰𝗮𝗹𝗶𝗻𝗴 Add more CPU, RAM, or faster storage to your database server. This is the most straightforward approach, but has physical and cost limitations. 𝟰. 𝗗𝗲𝗻𝗼𝗿𝗺𝗮𝗹𝗶𝘇𝗮𝘁𝗶𝗼𝗻 Duplicate some data to reduce joins. This technique trades storage space for speed and works well when reads outnumber writes significantly. 𝟱. 𝗗𝗮𝘁𝗮𝗯𝗮𝘀𝗲 𝗖𝗮𝗰𝗵𝗶𝗻𝗴 Store frequently accessed data in memory. This reduces disk I/O and dramatically speeds up read operations. Popular options include Redis and Memcached. 𝟲. 𝗥𝗲𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻 Create copies of your database to distribute read operations. This works well for read-heavy workloads but requires managing data consistency. 𝟳. 𝗦𝗵𝗮𝗿𝗱𝗶𝗻𝗴 Split your database horizontally across multiple servers. Each shard contains a subset of your data based on a key like user_id or geography. This distributes both read and write loads. 𝟴. 𝗣𝗮𝗿𝘁𝗶𝘁𝗶𝗼𝗻𝗶𝗻𝗴 Divide large tables into smaller, more manageable pieces within the same database. This improves query and maintenance operations on huge tables. 🎁 𝗕𝗼𝗻𝘂𝘀: 🔹 𝗔𝗻𝗮𝗹𝘆𝘇𝗲 𝗲𝘅𝗲𝗰𝘂𝘁𝗶𝗼𝗻 𝗽𝗹𝗮𝗻𝘀. Use EXPLAIN ANALYZE to see precisely how your database executes queries. This reveals hidden bottlenecks and helps you target optimization efforts where they matter most. 🔹 𝗔𝘃𝗼𝗶𝗱 𝗰𝗼𝗿𝗿𝗲𝗹𝗮𝘁𝗲𝗱 𝘀𝘂𝗯𝗾𝘂𝗲𝗿𝗶𝗲𝘀. These run once for every row the outer query returns, creating a performance nightmare. Rewrite them as JOINs for dramatic speed improvements. 🔹 𝗖𝗵𝗼𝗼𝘀𝗲 𝗮𝗽𝗽𝗿𝗼𝗽𝗿𝗶𝗮𝘁𝗲 𝗱𝗮𝘁𝗮 𝘁𝘆𝗽𝗲𝘀. Using VARCHAR(4000) when VARCHAR(40) would work wastes space and slows performance. Right-size your data types to match what you're storing. #technology #systemdesign #databases #sql #programming
-
Don’t index just filters. Index what you need. If you index only your WHERE columns, you leave performance on the table. One of the most effective yet overlooked techniques is Covering Indexes. Unlike standard indexes that only help filter rows, covering indexes include all columns required for a query. It will reduce query execution time by eliminating the need to access the main table. 𝗪𝗵𝘆 𝗖𝗼𝘃𝗲𝗿𝗶𝗻𝗴 𝗜𝗻𝗱𝗲𝘅𝗲𝘀? • By including all required columns, the query can be resolved entirely from the index, avoiding table lookups. • Can speed up join queries by reducing access to the base table. 𝗖𝗼𝗹𝘂𝗺𝗻𝘀 𝘁𝗼 𝗜𝗻𝗰𝗹𝘂𝗱𝗲: • WHERE: Filters rows. • SELECT: Data to retrieve. • ORDER BY: Sorting columns. 𝗦𝘁𝗲𝗽𝘀 𝘁𝗼 𝗖𝗿𝗲𝗮𝘁𝗲 𝗖𝗼𝘃𝗲𝗿𝗶𝗻𝗴 𝗜𝗻𝗱𝗲𝘅𝗲𝘀 1- Use execution plans to identify queries that perform frequent table lookups. 2- Focus on columns in WHERE, SELECT, and ORDER BY. 3- Don’t create multiple indexes with overlapping columns unnecessarily. 𝗖𝗼𝘃𝗲𝗿𝗶𝗻𝗴 𝗜𝗻𝗱𝗲𝘅𝗲𝘀 𝗮𝗿𝗲 𝗻𝗼𝘁 𝗳𝗼𝗿 𝗳𝗿𝗲𝗲. • Each insert, update, or delete operation must update the index, which can slow down write-heavy workloads. • Covering indexes consumes more disk space. Covering indexes are a powerful tool for database performance, especially for read-heavy applications. While they can increase write costs, the trade-off is often worth it for the dramatic speedups in query performance. Every table lookup wastes precious time. Fix it!
-
Building 𝟴𝟬𝟬 𝗺𝗶𝗹𝗹𝗶𝗼𝗻 𝘂𝘀𝗲𝗿𝘀 𝗼𝗻 𝗮 𝘀𝗶𝗻𝗴𝗹𝗲 𝗽𝗿𝗶𝗺𝗮𝗿𝘆 𝗣𝗼𝘀𝘁𝗴𝗿𝗲𝘀 𝗗𝗕 is the ultimate masterclass in "𝗕𝗼𝗿𝗶𝗻𝗴 𝗧𝗲𝗰𝗵𝗻𝗼𝗹𝗼𝗴𝘆" scaled to the extreme. OpenAI just dropped a blog on how they handle millions of QPS without sharding their primary database. While most would jump to complex distributed systems, they leaned into disciplined engineering. 𝗧𝗵𝗲 𝗦𝗰𝗮𝗹𝗶𝗻𝗴 𝗣𝗹𝗮𝘆𝗯𝗼𝗼𝗸: Connection Pooling: Used PgBouncer to slash connection latency from 50ms to 5ms. 𝗧𝗵𝘂𝗻𝗱𝗲𝗿𝗶𝗻𝗴 𝗛𝗲𝗿𝗱 𝗣𝗿𝗼𝘁𝗲𝗰𝘁𝗶𝗼𝗻: Implemented Cache Leasing. If the cache misses, only one request hits the DB to fetch data; others wait for the update. 𝗢𝗥𝗠 𝗗𝗶𝘀𝗰𝗶𝗽𝗹𝗶𝗻𝗲: Identified and killed "evil" 12-way joins generated by ORMs, moving complex logic to the application layer. 𝗪𝗼𝗿𝗸𝗹𝗼𝗮𝗱 𝗜𝘀𝗼𝗹𝗮𝘁𝗶𝗼𝗻: Split traffic into high/low priority tiers to ensure a new feature launch doesn't crash the entire API. The Result: 1 Primary Instance 50 Read Replicas Low double digit ms p99 latency 99.999% Availability The Takeaway: We often blame the database when the real issue is how we use it. Before you jump to "exotic" solutions, master the fundamentals of the tools you already have. 𝗦𝗶𝗺𝗽𝗹𝗲 𝗶𝘀 𝗰𝗼𝗺𝗽𝗹𝗶𝗰𝗮𝘁𝗲𝗱 𝗲𝗻𝗼𝘂𝗴𝗵. 𝗠𝗮𝘀𝘁𝗲𝗿 𝘁𝗵𝗲 "𝗯𝗼𝗿𝗶𝗻𝗴" 𝘀𝘁𝘂𝗳𝗳. #PostgreSQL #SystemDesign #ScalableSystems #BackendEngineering #DistributedSystems #SoftwareArchitecture #EngineeringExcellence #TechLeadership #DatabaseEngineering #OpenAI #BigTech #DevCommunity #CloudComputing
-
Do you know about Postgres' HOT chains optimization? One more reason to be mindful about index creation. When you UPDATE a row, Postgres creates a new version of the row and marks the old version as dead. This is fundamental to Postgres's MVCC. The dead rows will eventually get cleaned up by VACUUM. Naturally, Postgres also has to update all the indexes pointing to the dead row. Adding overhead to the update operation and also resulting in more dead tuples (in the index) and more work for VACUUM. Already, this is a good reason to be mindful about index creation. HOT chains, or Heap-Only Tuple chains, are an optimization introduced to reduce index bloat during updates. When you update a row AND the new version can fit on the same page as the old version, AND none of the indexed columns changed, Postgres can create a heap-only tuple. This new tuple is linked to the old one via a chain pointer. Crucially, the indexes don't need to be updated. The index still points to the old tuple location, and Postgres follows the chain to find the current version. This dramatically reduces write amplification since updating one row doesn't require updating every index on that table. HOT chain pruning is the process of cleaning up these chains. When Postgres needs space on a page or when VACUUM runs, it can prune HOT chains by removing intermediate dead tuples that are no longer visible to any transaction. This is much cheaper than full VACUUM because it doesn't need to touch indexes at all - it just compacts the heap page and updates the chain pointers. However, there's a catch. For HOT to work, the new tuple must fit on the same page AND no indexed columns can change. Every index you add to a table is another column that, if updated, will not allow the use of HOT chains optimization. If you have an index on column A and you update column B, that's fine for HOT. But if you have indexes on both A and B, updating either breaks HOT optimization. Many developers reflexively create indexes on every column that appears in a WHERE clause, but each index has a hidden cost beyond storage and insert performance. Every additional index reduces the likelihood that updates can use HOT optimization. In workloads with frequent updates, this can lead to severe index bloat as each update creates new index entries. Best practices: - Consider your query and update patterns when choosing what to index. If a table is frequently updated but rarely queried on certain columns, those columns are poor index candidates. - Consider using covering indexes strategically rather than multiple separate indexes. - Monitor HOT efficiency by querying pg\_stat\_user\_tables to see the ratio of hot\_update to n\_tup\_upd - a low ratio indicates HOT isn't working well. - Use pg_stat_user_indexes to find unused indexes. pg_qualstats to find popular WHERE columns. And HypoPG to make sure new indexes will be useful.
-
We optimized patient search speed by 90%—here’s how! On our FHIR-powered platform, patient search is used all day. Slow searches = frustrated users. Our initial queries on patient resources were slow, taking over 3 seconds in some cases. Instead of adding a search engine like Elasticsearch or Typesense, we optimized PostgreSQL’s full-text search using `tsvector` and `tsquery`. The result? ✅ Patient name search down from 1284ms to 2.7ms ✅ Identifier search down from 3121ms to 1.39ms ✅ CPU usage reduced by 73% All without adding new dependencies. PostgreSQL’s native capabilities are more powerful than you think! Read the whole blog here: https://lnkd.in/eKYbDN2A
-
Postgres: are you indexing for theoretical flexibility instead of production reality? Here's what optimizing Postgres for the common case actually looks like. At DBOS, we encode workflows directly in Postgres. Some of our production users have workflow tables with millions of rows. The obvious thing to do is add indexes on common columns (workflows status, parent ID, queue metadata, etc.) But durable workflows are write-heavy systems. That means index maintaince becomes costly at scale. Every workflow transition (ENQUEUED -> PENDING -> SUCCESS) updates indexed columns. And once you reach millions of workflows, index maintenance itself becomes a real cost: - More WAL - More bloat - Slower write queries - Higher vacuum cost We're optimizing with partial indexes. A partial index only indexes rows matching a condition. Here are two access patterns we've learnt to optimize for using partial indexes: 1. Workflow status Operational queries usually target a small subset of workflows: - failed workflows - currently running workflows A global index on status is expensive because almost every workflow transitions through it. Instead, partial indexes on failed and pending workflows cover these two common paths. 2. Queue dequeue queries Durable queues is our most common use case. Our dequeue path filters and sorts roughly like this: WHERE queue_name = ? AND status IN ('ENQUEUED', 'PENDING') ORDER BY priority, created_at So the index mirrors the exact access pattern: (queue_name, status, priority, created_at) ... and is partial on only ENQUEUED / PENDING rows. The takeaway is that you need to optimize for your common cases. #postgres