Optimize Postgres for Better Performance | Gadget posted on the topic

3,644 followers

1mo

Seeing huge Postgres read metrics? That’s usually normal. High read volumes happen as Postgres scans indexes, tables, and TOAST storage. But when query counts don’t match the amount of data being read, it’s often a sign there’s room to optimize and reduce costs. Watch how we break down Postgres reads vs. writes:

Transcript

So in the case of bites Red, when the Postgres subsystems do all their fancy stuff to read indexes, to read table data, to read data out of toast, which is like a Postgres subsystem for large values, all of that kind of builds up to add this value. And so these can often be really high. In this case, this application, Postgres scanned 885 gigabytes of data in the past 24 hours. The good news is the price per scan GB is like super duper, duper low. These numbers are often high, but that's just normal. That's how much data Postgres. The scan to do work for you. So don't be alarmed when you see many hundreds of gigabytes or terabytes here. That's very normal and it's not that expensive. We can also see this same value on a query count basis. So as opposed to the gigabytes that Postgres read off of the disks and from the caches, we can see how many times did it do a query and that usually follows the same sort of pattern. But the difference might be you can do a small number of read queries that are cheap and they don't read very much data or you can do a small number of queries that do read a lot of. And so in this case, this app has a difference where, you know, around 8:00 AM here we did a #622,000 read queries. But if we look at the bites red, there's this one sort of stand out here where we're reading a lot more data from this model. And so this is a good indicator that one of the read queries in this application could be optimized to scan less data. Or if it's, if it's super expensive, it could be issued less often to save a bit of money. The companion over here is the bytes written, and so this is anytime your application updates the model or creates a new record, changes the record, deletes the record. Postgres has to go change the data on disk, both the indexes and the table data itself, to record that that happened. The numbers are often much, much smaller, like usually between 100 to 1000 times smaller than bytes red. And that's just the way that Postgres works. Writes write less data than reads read.

To view or add a comment, sign in

More Relevant Posts

Shardul B.
2w
Report this post
Interesting project from the pgCache team, focused on solving important limitations around PostgreSQL logical replication. Worth checking out if you work with PostgreSQL, replication, or distributed systems. If this is relevant to you, feel free to reach out to Philip Johnston, PhD 🌍

James Nelson
2w

Most Postgres teams scale reads by adding a read replica. It works, but it's blunt. Most of your data is usually sitting idle, maybe you’re serving 10% of it at any given time. But, you’re paying storage, IOPS, and replication overhead on the cold rows along with the hot ones. PgCache is closer to a smart read replica. It solves the same problem: offload read traffic from your primary. But instead of duplicating the entire database, it caches only the data your application actually reads. CDC keeps it in sync with the primary, so there's no stale window and no TTLs to tune. Cold data stays on the primary, where it belongs. The shape of the cache ends up matching the shape of your traffic, not the shape of your schema. Deployment is a connection string change, and schema changes don’t break the cache. Cold data stays cold, and you stop paying to keep it warm twice. This is #4 in a series about PgCache. Previous post: https://lnkd.in/gCi2EV4r To learn more, check out pgcache.com, or feel free to reach out to myself or Philip Johnston, PhD 🌍

1 Comment
Like Comment
To view or add a comment, sign in
James Nelson
2w
Report this post
Most Postgres teams scale reads by adding a read replica. It works, but it's blunt. Most of your data is usually sitting idle, maybe you’re serving 10% of it at any given time. But, you’re paying storage, IOPS, and replication overhead on the cold rows along with the hot ones. PgCache is closer to a smart read replica. It solves the same problem: offload read traffic from your primary. But instead of duplicating the entire database, it caches only the data your application actually reads. CDC keeps it in sync with the primary, so there's no stale window and no TTLs to tune. Cold data stays on the primary, where it belongs. The shape of the cache ends up matching the shape of your traffic, not the shape of your schema. Deployment is a connection string change, and schema changes don’t break the cache. Cold data stays cold, and you stop paying to keep it warm twice. This is #4 in a series about PgCache. Previous post: https://lnkd.in/gCi2EV4r To learn more, check out pgcache.com, or feel free to reach out to myself or Philip Johnston, PhD 🌍

8 Comments
Like Comment
To view or add a comment, sign in
Command Prompt, Inc.

2,528 followers
1mo
Report this post
Does adding a column actually change your data? Or just your assumptions? Brian Fehrle, Database Administrator at Command Prompt, breaks down how Postgres handles default values without writing them to disk—and why that can throw off your query planner and tank performance. If your queries suddenly slow down after a schema change, this is one of those edge cases worth knowing. Watch the full conversation: https://lnkd.in/d73WTJGU #PostgreSQL #DatabasePerformance #DataEngineering
Like Comment
To view or add a comment, sign in
Sugu Sougoumarane
3w
Report this post
Did you know that the metadata in the Postgres wal is insufficient to uniquely identify timelines? Mats Kindahl uses TLA+ to prove this. https://lnkd.in/gXhHEvKi

How TLA+ Caught a Silent Data Divergence Bug in Postgres’s pg_rewind | Multigres multigres.com

4 Comments
Like Comment
To view or add a comment, sign in
Franck Pachot
2w
Report this post
TLA+ is a great tool for modeling and verifying distributed systems. Bringing HA to databases isn't just about testing failover by stopping a node. It must be reliable for all real-life scenarios since data loss can be silent. Great work!

Sugu Sougoumarane

Head of Multigres @ Supabase, Co-Creator @ Vitess
3w

Did you know that the metadata in the Postgres wal is insufficient to uniquely identify timelines? Mats Kindahl uses TLA+ to prove this. https://lnkd.in/gXhHEvKi

How TLA+ Caught a Silent Data Divergence Bug in Postgres’s pg_rewind | Multigres multigres.com
Like Comment
To view or add a comment, sign in
Roxana-Maria Haidiner

Product Marketing Consultant | SaaS & Technical Software
1mo
Report this post
Indexes in MongoDB are data structures that help queries find documents faster, without scanning the whole collection. This visual covers: • single-field indexes • compound indexes • unique indexes • and a visual way to inspect them in VisuaLeaf A small concept, but a very important one once your data starts growing. Read more here: https://lnkd.in/dG3UHRUZ
Like Comment
To view or add a comment, sign in
Piyush Tyagi
1mo
Report this post
A few days back, I ran into an interesting issue in PostgreSQL. The query planner chose a less specific index, even though a more optimal index was clearly available. Why? Because PostgreSQL estimated that fewer rows would match — so it assumed it would be faster. Reality? It turned out to be slower. This is something you rarely notice in local or staging environments. But in production: Data distribution is different. Statistics can be misleading. And the query planner doesn’t always behave the way you expect Key takeaway: Having the right index is not enough. Understanding how the query planner thinks is what actually matters. Production has a way of humbling assumptions. #softwareengineering #database

2 Comments
Like Comment
To view or add a comment, sign in
Vivek Taral
2w
Report this post
Learning more about SQL optimization and PostgreSQL fundamentals lately. One important thing I noticed while working with large analytical datasets is how much query performance depends on proper filtering, indexing strategy, and avoiding unnecessary full table scans. Using concepts like EXPLAIN ANALYZE, indexing basics, and optimized aggregation logic can significantly improve processing time for analytical workloads and monitoring systems. Currently exploring: • Query optimization • PostgreSQL internals • WAL & VACUUM basics • Performance-aware data workflows • Time-series analytics #SQL #PostgreSQL #DatabaseEngineering #DataEngineering #QueryOptimization
Like Comment
To view or add a comment, sign in
Philip Johnston, PhD 🌍
3w
Report this post
Stop scaling Postgres with brute force. Read replicas look like the easy answer to high read traffic .... ⚠��but they come with a hidden cost: You’re duplicating 100% of your data to serve a small fraction of it. Every 100% read replica means: - More always-on compute - More storage for mostly cold data - More operational overhead for you or your team It’s an expensive way to solve a selective problem. James ran into it a few years ago when he was trying to serve some data at the edge, but found his options were surprisingly limited. Hence PgCache, and our different approach: We're calling it a Pareto Replica. Instead of cloning your entire database, PgCache sits in front as a demand-driven proxy and only caches what’s actually queried. The hot 20% of data that drives 80% of your workload. No guesswork. No overprovisioning. Just smart design. You get the performance benefits of replicas, without dragging all that cold data along for the ride. So ask yourself: Are you still streaming your entire WAL everywhere… or are you optimizing for what actually matters? 👇 Curious how others are thinking about this. #PostgreSQL #DatabaseArchitecture #SRE #Scaling #PgCache
1 Comment
Like Comment
To view or add a comment, sign in
Ajinkya Dhomne
1mo
Report this post
I have been using Postgres for a while now, but I realized I didn't actually know what happens once a SQL command leaves the console and hits the disk. What does the data actually look like in the file system? I found myself wanting to understand the actual physical reality of how Postgres persists data and the actual files, bytes, and blocks sitting in the /base directory. I spent some time deconstructing the storage layer to see how logical databases, tables map to physical files. A few areas I explored: - Object Identifiers - How Postgres internally tracks database objects (database, tables, index, types, etc) and how its connected to relfilenodes - Physical Segmentation - How Postgres manages large tables across 1GB heap segments - Peek into actual data file on disk Its a surface level tour on how postgres internally organizes files. Its fascinating that most "magical" systems are simply bytes on disk at the end of the day! Read it here - https://lnkd.in/d-iqRttj . . . . . #postgres #database #storage
2 Comments
Like Comment
To view or add a comment, sign in

3,644 followers

View Profile Connect

Transcript

More Relevant Posts

Explore content categories