𝙃𝙤𝙬 𝙋𝙤𝙨𝙩𝙜𝙧𝙚𝙎𝙌𝙇 𝙄𝙢𝙥𝙡𝙚𝙢𝙚𝙣𝙩𝙨 𝙈𝙑𝘾𝘾 In large-scale systems, latency is not only about APIs. It is also about how fast the database can return the latest visible data. PostgreSQL follows an append-based MVCC model. An UPDATE does not modify a row in place; it creates a new physical tuple version with a new TID. Older versions remain until background cleanup such as VACUUM removes them. With frequent updates, multiple versions of the same logical row can accumulate. Indexes may point to different tuple versions, and during reads PostgreSQL must perform visibility checks to identify the valid record for the current snapshot. If these versions are located on different heap pages, the read path can involve additional heap lookups, buffer activity, and random I/O. At scale, this can affect read predictability and tail latency for update-heavy workloads. This design provides strong concurrency and write throughput, but some large platforms have evaluated alternative storage approaches as their workload patterns evolved. It is one of the factors discussed when parts of high-scale systems move to other database technologies. There is no single best database architecture. The right choice depends on update frequency, read-write ratio, index overhead, storage layout, and operational tuning. #SystemDesign #DatabaseInternals #PostgreSQL #MVCC #Scalability #PerformanceEngineering
PostgreSQL MVCC Latency and Scalability
More Relevant Posts
-
Databases aren’t just storage - they’re powerful engines for enforcing logic and integrity. 🚀 A deep dive into PostgreSQL architecture highlights a key shift: moving critical logic into the data layer. From efficient indexing and transaction management to constraints, migrations, and triggers—modern databases ensure consistency, speed, and security at scale. Key takeaways: • Use indexes to avoid costly full scans • Enforce data integrity with constraints & relationships • Prevent SQL injection with parameterized queries • Treat your database as an active system—not a passive store If you’re building backend systems, this mindset can drastically improve reliability and performance. How much logic do you currently push to your database layer? 👇 #PostgreSQL #BackendDevelopment #DatabaseDesign #SoftwareEngineering #SystemDesign
To view or add a comment, sign in
-
Why do pgvector benchmarks fail in production? Learn to scale Postgres vector queries with proper indexing, tuning, and SQL filtering. By Naina Ananthaswamy, thanks to NetApp Instaclustr
To view or add a comment, sign in
-
Why do pgvector benchmarks fail in production? Learn to scale Postgres vector queries with proper indexing, tuning, and SQL filtering. By Naina Ananthaswamy, thanks to NetApp Instaclustr
To view or add a comment, sign in
-
Why do pgvector benchmarks fail in production? Learn to scale Postgres vector queries with proper indexing, tuning, and SQL filtering. By Naina Ananthaswamy, thanks to NetApp Instaclustr
To view or add a comment, sign in
-
We reduced a PostgreSQL database from 20TB to 10TB — without breaking performance 🚀 Sounds risky? It was. Context On one of my projects, our PostgreSQL database was designed with performance in mind: • Heavy denormalization • Relaxed normalization rules • A hierarchy of tables using inheritance At the time, it worked great. The problem As data kept growing, storage usage exploded. We hit a hard limit — scaling disk further wasn't an option anymore. We had to act. Fast. The approach Instead of a full redesign (too risky, too slow), we focused on structural optimization: ✔️ Replaced top-level tables with views ✔️ Kept actual data only in lower-level tables ✔️ Moved indexes closer to the real data The result 💾 50% storage reduction (20TB → 10TB) ⚡ Almost no performance degradation What this taught me Denormalization buys you speed — but you pay with storage, and the bill grows with your data. And sometimes, scaling further means 👉 revisiting and partially undoing earlier design decisions Have you ever had to rethink your own architecture because of scale? #Performance #Scalability #DataEngineering #Optimization #BigData
To view or add a comment, sign in
-
-
In PostgreSQL 17 and later, your monitoring can stay "green" while your downstream data sync stops silently. It is easy to skip aligning transaction durability with logical slot synchronization, but in reality, they operate on entirely separate control planes. One governs when your application gets a commit acknowledgment(synchronous_standby_names), while the other governs when your logical slots are safe to move(synchronized_standby_slots). The strange thing is that your cluster may appear healthy and performant even after a standby failure, but your failover safety is already gone. Without intentional alignment, you risk a split state where your database is healthy, but your data pipeline is frozen.If your architecture relies on logical replication alongside synchronous commits, you can review this post to ensure your setup is actually as robust as it look https://lnkd.in/eCnEBev6
To view or add a comment, sign in
-
Most production databases end up doing two things at the same time: → operational queries (OLTP) → analytical / repetitive queries Over time, this creates a familiar problem: performance pressure + rising infrastructure cost. We recently saw this in practice with Hussle. (link in the comments) Over the course of four weeks, Readyset QueryPilot identified and automatically cached high-impact workloads across ~1700 queries spanning 11 databases. The impact was significant: * Reduced peak-season database pressure * Avoided additional replica scaling * Simplified operational management * Preserved application compatibility And most importantly: → database load was reduced to near zero What’s interesting here is not just the result, but how it was achieved. Instead of scaling reads with more infrastructure, the focus was on reducing redundant work at the database layer. One approach we’ve been working on at Readyset is to split this into two layers: → **Deep Cache**: for mission-critical queries (real-time consistency) → **Shallow Cache**: for reducing read load on repetitive queries with minimal complexity Next Tuesday, We’ll walk through how Shallow Cache works in practice: * where it sits in the architecture * how it reduces database load * and the tradeoffs involved If you're dealing with read pressure or scaling challenges, it might be useful. Link to join: https://lnkd.in/dYu_vkP5 🇧🇷 No case da Hussle, o ReadySet reduziu drasticamente a carga no banco sem precisar escalar réplicas. Vamos mostrar na prática como isso funciona na terça. #scalability #readyset #hussle #data #cache #mysql #postgres #performance
To view or add a comment, sign in
-
-
The biggest myth about PostgreSQL is that indexing is a silver bullet for query performance. Many teams believe that simply adding more indexes will lead to better SQL optimization. However, this misconception can lead to bloated databases and slower write operations. In fact, excessive indexing can increase the complexity of data maintenance and degrade performance during data modifications. To effectively optimize your PostgreSQL database: - Analyze query patterns to determine which indexes are truly necessary. - Implement partial indexes to boost performance without adding overhead. - Utilize connection pooling to manage database connections efficiently and reduce latency. - Consider sharding your database for improved scalability, especially with high-traffic applications. - Regularly review and refine your indexing strategy to align with evolving data access patterns. - Explore replication strategies to enhance read performance and disaster recovery capabilities. How are you balancing indexing with the need for performance in your PostgreSQL deployments? Building production-grade automation | CODE AT IT #PostgreSQL #DatabaseEngineering #SQLOptimization #TechLeadership #SoftwareArchitect
To view or add a comment, sign in
-
-
Today I explored a very interesting distributed database: CockroachDB. CockroachDB is a distributed SQL database inspired by Google Spanner, designed to provide high availability, horizontal scalability, and strong consistency. It uses automatic replication and a distributed consensus protocol (Raft) to ensure data correctness even in failure scenarios. Key features: - Automatic data distribution across nodes - Replication and fault tolerance - Strong consistency (ACID) - Horizontal scalability (scale-out) - PostgreSQL-compatible SQL dialect Pros: - High availability by default - Easy horizontal scaling - Strong consistency (great for critical systems) - Reduced need for manual failover Cons: - Higher latency compared to single-node databases - Increased operational complexity - Higher resource consumption - Not fully compatible with PostgreSQL Comparison with PostgreSQL: While PostgreSQL is a mature, highly reliable relational database optimized for single-node or controlled replication setups, CockroachDB is built from the ground up for distributed environments, prioritizing resilience and global scalability. In summary: PostgreSQL → simplicity, maturity, strong local performance CockroachDB → distribution, resilience, global scale Every choice comes with trade-offs — understanding the context is key. #SystemDesign #Database #Backend #CockroachDB #PostgreSQL
To view or add a comment, sign in
-
Something I've been thinking about with Postgres at scale: The optimization work is visible. You can put it on a roadmap. Measure it. Argue about it in sprint planning. The cost of staying is invisible. Query tuning, partition management, autovacuum babysitting. It compounds every month and few teams track it. One team I talked to was burning 20% of their engineering capacity on database maintenance. They didn't know until we asked them to measure it. The current optimizations are working. That's exactly when you can afford to look at what they're actually costing you. https://lnkd.in/g7ruYpwf
To view or add a comment, sign in