Seeing huge Postgres read metrics? That’s usually normal. High read volumes happen as Postgres scans indexes, tables, and TOAST storage. But when query counts don’t match the amount of data being read, it’s often a sign there’s room to optimize and reduce costs. Watch how we break down Postgres reads vs. writes:
More Relevant Posts
-
Interesting project from the pgCache team, focused on solving important limitations around PostgreSQL logical replication. Worth checking out if you work with PostgreSQL, replication, or distributed systems. If this is relevant to you, feel free to reach out to Philip Johnston, PhD 🌍
Most Postgres teams scale reads by adding a read replica. It works, but it's blunt. Most of your data is usually sitting idle, maybe you’re serving 10% of it at any given time. But, you’re paying storage, IOPS, and replication overhead on the cold rows along with the hot ones. PgCache is closer to a smart read replica. It solves the same problem: offload read traffic from your primary. But instead of duplicating the entire database, it caches only the data your application actually reads. CDC keeps it in sync with the primary, so there's no stale window and no TTLs to tune. Cold data stays on the primary, where it belongs. The shape of the cache ends up matching the shape of your traffic, not the shape of your schema. Deployment is a connection string change, and schema changes don’t break the cache. Cold data stays cold, and you stop paying to keep it warm twice. This is #4 in a series about PgCache. Previous post: https://lnkd.in/gCi2EV4r To learn more, check out pgcache.com, or feel free to reach out to myself or Philip Johnston, PhD 🌍
To view or add a comment, sign in
-
Most Postgres teams scale reads by adding a read replica. It works, but it's blunt. Most of your data is usually sitting idle, maybe you’re serving 10% of it at any given time. But, you’re paying storage, IOPS, and replication overhead on the cold rows along with the hot ones. PgCache is closer to a smart read replica. It solves the same problem: offload read traffic from your primary. But instead of duplicating the entire database, it caches only the data your application actually reads. CDC keeps it in sync with the primary, so there's no stale window and no TTLs to tune. Cold data stays on the primary, where it belongs. The shape of the cache ends up matching the shape of your traffic, not the shape of your schema. Deployment is a connection string change, and schema changes don’t break the cache. Cold data stays cold, and you stop paying to keep it warm twice. This is #4 in a series about PgCache. Previous post: https://lnkd.in/gCi2EV4r To learn more, check out pgcache.com, or feel free to reach out to myself or Philip Johnston, PhD 🌍
To view or add a comment, sign in
-
Does adding a column actually change your data? Or just your assumptions? Brian Fehrle, Database Administrator at Command Prompt, breaks down how Postgres handles default values without writing them to disk—and why that can throw off your query planner and tank performance. If your queries suddenly slow down after a schema change, this is one of those edge cases worth knowing. Watch the full conversation: https://lnkd.in/d73WTJGU #PostgreSQL #DatabasePerformance #DataEngineering
To view or add a comment, sign in
-
-
Did you know that the metadata in the Postgres wal is insufficient to uniquely identify timelines? Mats Kindahl uses TLA+ to prove this. https://lnkd.in/gXhHEvKi
To view or add a comment, sign in
-
TLA+ is a great tool for modeling and verifying distributed systems. Bringing HA to databases isn't just about testing failover by stopping a node. It must be reliable for all real-life scenarios since data loss can be silent. Great work!
Did you know that the metadata in the Postgres wal is insufficient to uniquely identify timelines? Mats Kindahl uses TLA+ to prove this. https://lnkd.in/gXhHEvKi
To view or add a comment, sign in
-
Indexes in MongoDB are data structures that help queries find documents faster, without scanning the whole collection. This visual covers: • single-field indexes • compound indexes • unique indexes • and a visual way to inspect them in VisuaLeaf A small concept, but a very important one once your data starts growing. Read more here: https://lnkd.in/dG3UHRUZ
To view or add a comment, sign in
-
-
A few days back, I ran into an interesting issue in PostgreSQL. The query planner chose a less specific index, even though a more optimal index was clearly available. Why? Because PostgreSQL estimated that fewer rows would match — so it assumed it would be faster. Reality? It turned out to be slower. This is something you rarely notice in local or staging environments. But in production: Data distribution is different. Statistics can be misleading. And the query planner doesn’t always behave the way you expect Key takeaway: Having the right index is not enough. Understanding how the query planner thinks is what actually matters. Production has a way of humbling assumptions. #softwareengineering #database
To view or add a comment, sign in
-
Learning more about SQL optimization and PostgreSQL fundamentals lately. One important thing I noticed while working with large analytical datasets is how much query performance depends on proper filtering, indexing strategy, and avoiding unnecessary full table scans. Using concepts like EXPLAIN ANALYZE, indexing basics, and optimized aggregation logic can significantly improve processing time for analytical workloads and monitoring systems. Currently exploring: • Query optimization • PostgreSQL internals • WAL & VACUUM basics • Performance-aware data workflows • Time-series analytics #SQL #PostgreSQL #DatabaseEngineering #DataEngineering #QueryOptimization
To view or add a comment, sign in
-
Stop scaling Postgres with brute force. Read replicas look like the easy answer to high read traffic .... ⚠��but they come with a hidden cost: You’re duplicating 100% of your data to serve a small fraction of it. Every 100% read replica means: - More always-on compute - More storage for mostly cold data - More operational overhead for you or your team It’s an expensive way to solve a selective problem. James ran into it a few years ago when he was trying to serve some data at the edge, but found his options were surprisingly limited. Hence PgCache, and our different approach: We're calling it a Pareto Replica. Instead of cloning your entire database, PgCache sits in front as a demand-driven proxy and only caches what’s actually queried. The hot 20% of data that drives 80% of your workload. No guesswork. No overprovisioning. Just smart design. You get the performance benefits of replicas, without dragging all that cold data along for the ride. So ask yourself: Are you still streaming your entire WAL everywhere… or are you optimizing for what actually matters? 👇 Curious how others are thinking about this. #PostgreSQL #DatabaseArchitecture #SRE #Scaling #PgCache
To view or add a comment, sign in
-
-
I have been using Postgres for a while now, but I realized I didn't actually know what happens once a SQL command leaves the console and hits the disk. What does the data actually look like in the file system? I found myself wanting to understand the actual physical reality of how Postgres persists data and the actual files, bytes, and blocks sitting in the /base directory. I spent some time deconstructing the storage layer to see how logical databases, tables map to physical files. A few areas I explored: - Object Identifiers - How Postgres internally tracks database objects (database, tables, index, types, etc) and how its connected to relfilenodes - Physical Segmentation - How Postgres manages large tables across 1GB heap segments - Peek into actual data file on disk Its a surface level tour on how postgres internally organizes files. Its fascinating that most "magical" systems are simply bytes on disk at the end of the day! Read it here - https://lnkd.in/d-iqRttj . . . . . #postgres #database #storage
To view or add a comment, sign in
-