Abhishek Choudhary’s Post

4mo

I sharded our database. Best decision ever. Until it wasn't. Last year, our write throughput was dying. Every user action—updates, state changes, activity logs—was slamming the same tables. Classic bottleneck. The solution felt obvious: shard by user_id. And honestly? It worked beautifully 💯 Write throughput went up 3x. Contention basically disappeared. Latency dropped hard. The team could breathe again. I thought we'd solved it. Then product evolved. Suddenly we needed users to collaborate. Share resources. Enforce global rules. Run workflows that touched multiple accounts at once. And that's when everything started breaking in weird ways. Simple queries now hit multiple shards. Transactions that used to be atomic became "eventually consistent" nightmares. I was writing compensating logic, retry handlers, background jobs to "fix up" state after the fact. That's when I realized , We didn't have a scale problem. We had Data Model problem. I'd sharded MutableSttate. Not events. I changed: - Instead of storing "current state" in shards, I made writes append-only. Every change became an event. Current state became a derived view—rebuilt from events. - This let me keep events in sharded storage (they have clear owners), run async pipelines for side-effects, and centralize only what needed to be global. The difference was night and day. Write spikes stopped hurting. Correctness got simpler. New features didn't need architecture rewrites. Sharding isn't a database trick. It's a domain decision. - If your boundaries are artificial (like "let's just split by user_id"), the system will expose that—eventually. - High write load ≠ shardable workload. - Fix the model first. Scale second. Yeah, event sourcing added complexity—schemas, replays, eventual consistency. But those problems were manageable. Cross-shard transactions? Not so much.

5 Comments

Clarence Tauro, Ph.D 4mo

This is exactly where CockroachDB shines.. strong consistency with scalable writes, without forcing you into complex cross-shard transactions. Model cleanly, scale safely. 🙂

4 Reactions

Avinash Anand 4mo

The line 'I'd sharded MutableState. Not events' should be on a poster in every engineering room. We’ve all been in that 'eventually consistent' nightmare where you're basically writing a custom distributed transaction manager just to fix a simple race condition. Kudos for recognizing it was a data model issue rather than just throwing more infrastructure at it!

1 Reaction

Jagadeesh Dasari 4mo

Thanks for sharing. What caught me in your experience was realising it wasn't a scaling issue - it was a modeling issue that hadn't been anticipated. Things worked great until a change came in, and global rules showed up. Systems silently raised a simple question- "Are your boundaries real?". My learnings: Models are the first truth Databases follow

1 Reaction

Aditya Soni 4mo

Interesting post! I am curious, Can you point to a reference wrt shared by mutablestate? Thanks!

See more comments

To view or add a comment, sign in

More Relevant Posts

Niyazi Garagashli
3mo
Report this post
My info-bubble almost never talks about Onehouse for some reason. I can sort of understand it when it comes to local and CIS countries, as we're not that into cloud for data, but I don't see much even from my international communities. As for me, I always enjoy reading their blog on how they engineer their platform. Some of the decisions or focus areas I'm not necessarily a fan of, like their early bet on Hudi vs Iceberg, but nevertheless, I'm actively following their journey specifically to get insights in the engineering mindset behind the decisions. This fresh article is a great example: having to serve point lookups rapid fire low latency data for AI agents and humans alike, while remaining single source of truth for overall enterprise analytics is a very serious and real issue most of us are struggling with daily. OLAP systems were invented because large amounts of data needed to be analyzed without bringing down the operational systems. Then, when the results of analytics needed to be operationalized back, a new set of problems came to be. Do we reverse-ETL it back into core systems? Do we spin up intermediate databases and services, forcing ourselves to build more and more brittle integrations? Anyone who tried doing something like ID lookups on Iceberg knows that simply providing an API or JDBC is not going to be enough. This is the Onehouse's take on the issue. Although, I do not exactly agree that StarRocks favors the local storage - it has been exactly the opposite for at least a year now, I see their point. Remember, all systems go in cycles - you combine everything into one platform for convenience and maintenance <-> you separate everything into specialty tools to do specific tasks better. Right about now we're in desperate need of simplification. Maybe Onehouse is the one to crack it, maybe not, but I'm curious to continue to watch their journey. https://lnkd.in/dTFDkqCE

Announcing Onehouse LakeBase™: database speeds finally on the lakehouse onehouse.ai

4 Comments
Like Comment
To view or add a comment, sign in
Anvansh Singh
3mo
Report this post
𝗪𝗵𝘆 𝗬𝗼𝘂𝗿 𝗕𝗮𝗰𝗸𝗲𝗻𝗱 𝗜𝘀 𝗦𝗹𝗼𝘄 (𝗔𝗻𝗱 𝗜𝘁’𝘀 𝗡𝗼𝘁 𝘁𝗵𝗲 𝗗𝗮𝘁𝗮𝗯𝗮𝘀𝗲!!!!) We have a habit of blaming the vessel rather than what we pour into it. When a system crawls, the immediate instinct is to point at the database, obsessing over indices and locking strategies. But more often than not, the database is just doing exactly what it was told to do, quietly carrying the weight of decisions made layers above it. The slowness isn't usually in the storage; it is in the space between things. The silence between the notes is where the music drags. We build architectures that chat too much, creating a cacophony of network hops that no amount of SQL tuning can fix. We treat remote calls like local variables, forgetting that physics and distance still apply to code. The friction rarely lives in the disk I/O; it lives in the architectural complexity we introduced under the guise of modernization. We serialize, deserialize, handshake, and wait, creating a sluggishness that feels like a heavy load but is actually just wasted motion. This is rarely a technical oversight; it is almost always a psychological one. We fragment our systems because we crave clear boundaries and ownership, but we forget that every boundary requires a bridge. When we prioritize our own developer experience, our need to work in isolated silos, over the cohesiveness of the request, the system suffers. We trade the discomfort of a large codebase for the invisible tax of latency, forcing the user to wait while our separated services introduce themselves to one another over and over again. There is elegance in distributed systems, but there is also a hidden cost to decoupling. The database is often the most honest component in the stack, it stores truth. The application layer, however, is where we interpret that truth, and that interpretation is where the inefficiency breeds. I spent a lot of time staring at slow query logs before I realized I was looking at the symptom, not the disease. I was trying to optimize the retrieval of data when I should have been questioning why we needed to ask for it six distinct times in a single user interaction. Before you add another read replica or blame the infrastructure, look at the conversation your services are having. Is it a concise exchange of necessary information, or is it a sprawling, bureaucratic meeting where nothing gets decided until the very end? How much of your latency is actually just the sound of your architecture talking to itself?
Like Comment
To view or add a comment, sign in
Anirban Nag
3mo
Report this post
𝐃𝐚𝐭𝐚𝐛𝐚𝐬𝐞 𝐒𝐜𝐚𝐥𝐢𝐧𝐠 𝐓𝐞𝐜𝐡𝐧𝐢𝐪𝐮𝐞𝐬 - Daily use for all data engineers based on the requirement 1️⃣ Denormalization → Reduce joins by storing redundant data Example: Store customer_name with orders 2️⃣ Vertical Partitioning → Split wide tables into multiple parts Example: • Users_Core (user_id, name) • Users_Profile (user_id, dob, address, profile_pic) 3️⃣ Sharding → Split data by key (like user_id) Example: • A–F → Shard 1 • G–S → Shard 2 • T–Z → Shard 3 4️⃣ Replication → Copy data to multiple nodes for read scaling Example: • Primary → Replica A, Replica B 5️⃣ Indexing → Speed up queries using B-Tree or Hash indexes Use wisely too many indexes hurt writes. 6️⃣ Query Optimization → Use EXPLAIN plans → Rewrite slow joins, avoid SELECT * 7️⃣ Vertical Scaling → Upgrade CPU, RAM, Disk on the same server Simple but limited 8️⃣ Connection Pooling → Reuse DB connections across clients Improves throughput & reduces overhead 9️⃣ Caching → Store frequently accessed data in memory Tools: Redis, Memcached 🔟 Materialized Views → Pre-computed results for expensive queries Fast response, periodic refresh
Like Comment
To view or add a comment, sign in
Ramu G
3mo
Report this post
🚀 𝗗𝗮𝘁𝗮𝗯𝗮𝘀𝗲𝘀 𝗱𝗼𝗻’𝘁 𝗳𝗮𝗶𝗹 𝗮𝘁 𝘀𝗰𝗮𝗹𝗲 — 𝗱𝗲𝘀𝗶𝗴𝗻𝘀 𝗱𝗼. Everyone talks about “scaling the backend.” Very few truly understand how databases scale in the real world. Here are 𝟭𝟬 𝗱��𝘁𝗮𝗯𝗮𝘀𝗲 𝘀𝗰𝗮𝗹𝗶𝗻𝗴 𝘁𝗲𝗰𝗵𝗻𝗶𝗾𝘂𝗲𝘀 used behind high-traffic production systems 👇 🧠 The techniques every serious builder should understand: 1️⃣ Indexing Speeds up reads by reducing full table scans. Powerful — but over-indexing can hurt write performance. 2️⃣ Vertical Scaling Upgrade the machine (more CPU, RAM, disk). Quick to implement — but eventually hits a hard ceiling. 3️⃣ Caching Serve frequently accessed data from memory instead of the database. Tools like Redis or Memcached protect systems from traffic spikes. 4️⃣ Sharding Distribute data across multiple databases (by user_id, region, etc.). Complex — but critical at massive scale. 5️⃣ Replication One primary, multiple read replicas. Improves availability and read throughput. 6️⃣ Query Optimization Bad queries can cripple great infrastructure. Use EXPLAIN plans — don’t guess. 7️⃣ Connection Pooling Reuse database connections instead of constantly opening new ones. Small implementation change, big performance impact. 8️⃣ Vertical Partitioning Split tables by columns (core data vs profile/extended data). Cleaner schemas, faster queries. 9️⃣ Denormalization Duplicate selected data to optimize reads. Perfect for read-heavy systems. 🔟 Materialized Views Precompute expensive queries. Ideal for dashboards and analytics workloads. 💡 Scaling isn’t about choosing one technique. It’s about knowing when and why to apply each. Most real-world production systems combine multiple strategies: • Caching + Replication • Sharding + Indexing • Denormalization + Materialized Views That’s how systems survive real traffic. What scaling challenges have you faced in production? 🙏 And a big thanks to @w3schools.com — where many of us learned SQL basics before designing systems that handle millions of queries. #BackendEngineering #DatabaseDesign #SystemDesign #Scalability #SoftwareArchitecture
Like Comment
To view or add a comment, sign in
David Bressler
3mo Edited
Report this post
Shattering all #distributedSQL cluster testing records... Cockroach Labs #CockroachDB now supports 300-node clusters. These are the largest clusters tested by any vendor in the space, continuing to demonstrate why we're the distributed SQL category leader. A few things jump out at me from these tests: 1. We don't only test raw performance, but performance under adversity. Our 300-node clusters perform when things fail and deliver on our resilience promises - even through grey failures like heavy background traffic or disk stalls. 2. Performance scales linearly. There's no drop off as the cluster gets larger. This is really important, and not an obvious result. 3. We're doing this testing so it's not our customers who bear the burden of testing our systems at scale. Remember, it's not just whether or not the cluster serves data... but do observability / debugging / UI & experience all hold up under scale? Learn more about our testing methodology and results here: https://lnkd.in/edXUFd-V

300-Node Clusters Now Supported in CockroachDB cockroachlabs.com

2 Comments
Like Comment
To view or add a comment, sign in
Andrew Madson
3mo
Report this post
Your data lake is lying to you. And you won't know until it's too late. Concurrent writes silently corrupt your table state. Readers see half-applied changes. You get flexibility without correctness. Databases solved this decades ago with ACID. Data lakes did not. 𝐄𝐧𝐭𝐞𝐫 𝐀𝐩𝐚𝐜𝐡𝐞 𝐈𝐜𝐞𝐛𝐞𝐫𝐠. Iceberg adds a transactional metadata layer on top of your existing data files. It doesn't replace your lake — it makes it behave like a real table. 𝐓𝐡𝐞 𝐬𝐞𝐜𝐫𝐞𝐭? One atomic metadata pointer swap. Every table change produces a new snapshot. Writers build new metadata, then atomically swap a pointer from old to new. If the swap fails, nothing changes. That single mechanism delivers all four ACID properties: → 𝐀𝐭𝐨𝐦𝐢𝐜𝐢𝐭𝐲: New files are invisible until the pointer updates. All or nothing. → 𝐂𝐨𝐧𝐬𝐢𝐬𝐭𝐞𝐧𝐜𝐲: Writers never mutate existing metadata. If two writers race, the loser retries against the winner's committed state. → 𝐈𝐬𝐨𝐥𝐚𝐭𝐢𝐨𝐧: Readers load a snapshot and read against it — no locks needed. Writers use optimistic concurrency control. → 𝐃𝐮𝐫𝐚𝐛𝐢𝐥𝐢𝐭𝐲: Committed snapshots persist in durable storage. Time travel and rollback come free. 𝐖𝐡𝐚𝐭 𝐲𝐨𝐮 𝐤𝐞𝐞𝐩: ✓ Open file formats (Parquet) ✓ Engine interoperability ✓ Separation of storage and compute ✓ Elastic scalability 𝐖𝐡𝐚𝐭 𝐲𝐨𝐮 𝐠𝐚𝐢𝐧: ✓ Correctness — the thing you assumed you had all along. ⚠️ One caveat: Your catalog must provide a true atomic compare-and-swap. Without that, you have nicely organized files — not transactional tables. If you're building on a data lake without transactional guarantees, you're accepting data quality risk that's entirely avoidable. What's your current approach to handling concurrent writes on your lake? #ApacheIceberg #DataLake #DataEngineering
7 Comments
Like Comment
To view or add a comment, sign in
Mehertej Chettu
3mo
Report this post
Stop Scaling Clusters. Start Scaling Efficiency. I often see teams trying to solve performance issues by "throwing more compute at it." In 2026, that isn't a strategy—it’s technical debt with a high interest rate. Real scale isn't about how big your cluster is; it’s about the Unit Cost of Data. To keep that cost sustainable, we have to move beyond "making it work" and start enforcing Resource Governance at the SQL level. If you want to build systems that last, you have to master the Big Three of SQL Optimization: 1. Predicate Pushdown (Physical Design) We shouldn't be moving bytes that we don't need. The Insight: If your compute layer is filtering data that your storage layer could have ignored, you are leaking resources. By pushing filters to the "edge," we minimize network I/O and eliminate unnecessary shuffle overhead. 2. Projection (Interface Contracts) SELECT * is more than just a performance killer—it’s a violation of a clean interface. The Insight: In a columnar world (Snowflake, Parquet, Iceberg), every column is a physical dependency. By explicitly defining projections, we reduce physical I/O by up to 90% and ensure downstream consumers only depend on what is strictly necessary. 3. Partition Pruning (Logical Governance) A full table scan is a failure of metadata governance. The Insight: Physical storage layout must mirror access patterns. If a query doesn't anchor to a partition key, it shouldn't even be able to start. Pruning allows us to skip 99% of the data, turning a "Search" into "Direct Access." The 2026 Shift The era of "accidental data" is over. In 2026, data isn't just something that happens after an app runs—it’s an intentional product that requires its own specification Efficiency isn't a "nice-to-have" optimization anymore. It is a core part of the product's specification. When we build with efficiency by design, we aren't just writing code; we are managing the financial and operational health of the data ecosystem. How are you building "Efficiency by Design" into your 2026 roadmap? Let’s talk strategy in the comments. 👇 #DataEngineering #SystemsDesign #DataStrategy #BigData #SQL #CloudCost

1 Comment
Like Comment
To view or add a comment, sign in
Malik Farhan Ahmed
3mo
Report this post
🚀 Databases don’t fail at scale — designs do. Everyone talks about “scaling the backend” Very few actually understand how databases scale in the real world. This visual breaks down the Top 10 Database Scaling Techniques — the exact tools used behind high-traffic systems 👇 🧠 The 10 techniques every serious builder should know 1️⃣ Indexing Speed up reads by reducing full table scans. 👉 Powerful, but over-indexing can slow writes. 2️⃣ Vertical Scaling Bigger machine = more CPU, RAM, disk. 👉 Fast to implement, but hits a hard limit. 3️⃣ Caching Serve data from memory instead of the database. 👉 Redis / Memcached save databases from traffic spikes. 4️⃣ Sharding Split data across multiple databases by key (A–Z, user_id, region). 👉 Complex, but essential at massive scale. 5️⃣ Replication One primary → many replicas for read-heavy workloads. 👉 Improves availability and read throughput. 6️⃣ Query Optimization Bad queries kill good databases. 👉 EXPLAIN plans > guessing. 7️⃣ Connection Pooling Reuse DB connections instead of opening new ones. 👉 Small change, huge performance win. 8️⃣ Vertical Partitioning Split tables by columns (core vs profile data). 👉 Faster queries, cleaner schemas. 9️⃣ Denormalization Trade storage for speed by duplicating data. 👉 Read-heavy systems love this. 🔟 Materialized Views Precompute expensive queries. 👉 Perfect for dashboards and analytics. 💡 Scaling isn’t about picking ONE technique. It’s about knowing WHEN and WHY to use each. Most production systems use multiple techniques together: Caching + Replication Sharding + Indexing Denormalization + Materialized Views That’s how real-world systems survive traffic. 🙏 And a big thanks to w3schools.com — where many of us learned SQL basics before designing systems that handle millions of queries. 👉 Follow Malik Farhan Ahmed for practical system design, backend scaling, and real-world engineering breakdowns 💬 Comment “SYSTEM DESIGN” if you want a deep dive on any of these techniques #SystemDesign #DatabaseScaling #BackendEngineering #SoftwareArchitecture #SQL #DataEngineering #HighPerformance #ScalableSystems #TechLeadership #BuildInPublic
Like Comment
To view or add a comment, sign in
Subhan Ali
3mo
Report this post
🚀 Databases don’t fail at scale — designs do. Everyone talks about “scaling the backend” Very few actually understand how databases scale in the real world. This visual breaks down the Top 10 Database Scaling Techniques — the exact tools used behind high-traffic systems 👇 🧠 The 10 techniques every serious builder should know 1️⃣ Indexing Speed up reads by reducing full table scans. 👉 Powerful, but over-indexing can slow writes. 2️⃣ Vertical Scaling Bigger machine = more CPU, RAM, disk. 👉 Fast to implement, but hits a hard limit. 3️⃣ Caching Serve data from memory instead of the database. 👉 Redis / Memcached save databases from traffic spikes. 4️⃣ Sharding Split data across multiple databases by key (A–Z, user_id, region). 👉 Complex, but essential at massive scale. 5️⃣ Replication One primary → many replicas for read-heavy workloads. 👉 Improves availability and read throughput. 6️⃣ Query Optimization Bad queries kill good databases. 👉 EXPLAIN plans > guessing. 7️⃣ Connection Pooling Reuse DB connections instead of opening new ones. 👉 Small change, huge performance win. 8️⃣ Vertical Partitioning Split tables by columns (core vs profile data). 👉 Faster queries, cleaner schemas. 9️⃣ Denormalization Trade storage for speed by duplicating data. 👉 Read-heavy systems love this. 🔟 Materialized Views Precompute expensive queries. 👉 Perfect for dashboards and analytics. 💡 Scaling isn’t about picking ONE technique. It’s about knowing WHEN and WHY to use each. Most production systems use multiple techniques together: Caching + Replication Sharding + Indexing Denormalization + Materialized Views That’s how real-world systems survive traffic. 🙏 And a big thanks to w3schools.com — where many of us learned SQL basics before designing systems that handle millions of queries. 👉 Follow Subhan Ali for practical system design, backend scaling, and real-world engineering breakdowns 💬 Comment “SYSTEM DESIGN” if you want a deep dive on any of these techniques #SystemDesign #DatabaseScaling #BackendEngineering #SoftwareArchitecture #SQL #DataEngineering #HighPerformance #ScalableSystems #TechLeadership #BuildInPublic
20 Comments
Like Comment
To view or add a comment, sign in

38,553 followers

View Profile Follow

Abhishek Choudhary’s Post

More from this author

Slack New Architecture

Unit Testing Apache Spark Applications in Scala or Python

Spark On YARN cluster, Some Observations

Explore content categories