Abhishek Choudhary’s Post

I sharded our database. Best decision ever. Until it wasn't. Last year, our write throughput was dying. Every user action—updates, state changes, activity logs—was slamming the same tables. Classic bottleneck. The solution felt obvious: shard by user_id. And honestly? It worked beautifully 💯 Write throughput went up 3x. Contention basically disappeared. Latency dropped hard. The team could breathe again. I thought we'd solved it. Then product evolved. Suddenly we needed users to collaborate. Share resources. Enforce global rules. Run workflows that touched multiple accounts at once. And that's when everything started breaking in weird ways. Simple queries now hit multiple shards. Transactions that used to be atomic became "eventually consistent" nightmares. I was writing compensating logic, retry handlers, background jobs to "fix up" state after the fact. That's when I realized , We didn't have a scale problem. We had Data Model problem. I'd sharded MutableSttate. Not events. I changed: - Instead of storing "current state" in shards, I made writes append-only. Every change became an event. Current state became a derived view—rebuilt from events. - This let me keep events in sharded storage (they have clear owners), run async pipelines for side-effects, and centralize only what needed to be global. The difference was night and day. Write spikes stopped hurting. Correctness got simpler. New features didn't need architecture rewrites. Sharding isn't a database trick. It's a domain decision. - If your boundaries are artificial (like "let's just split by user_id"), the system will expose that—eventually. - High write load ≠ shardable workload. - Fix the model first. Scale second. Yeah, event sourcing added complexity—schemas, replays, eventual consistency. But those problems were manageable. Cross-shard transactions? Not so much.

This is exactly where CockroachDB shines.. strong consistency with scalable writes, without forcing you into complex cross-shard transactions. Model cleanly, scale safely. 🙂

The line 'I'd sharded MutableState. Not events' should be on a poster in every engineering room. We’ve all been in that 'eventually consistent' nightmare where you're basically writing a custom distributed transaction manager just to fix a simple race condition. Kudos for recognizing it was a data model issue rather than just throwing more infrastructure at it!

Thanks for sharing. What caught me in your experience was realising it wasn't a scaling issue - it was a modeling issue that hadn't been anticipated. Things worked great until a change came in, and global rules showed up. Systems silently raised a simple question- "Are your boundaries real?". My learnings: Models are the first truth Databases follow

Interesting post! I am curious, Can you point to a reference wrt shared by mutablestate? Thanks!

Like
Reply
See more comments

To view or add a comment, sign in

Explore content categories