Best Practices for Low-Latency Database Management

Explore top LinkedIn content from expert professionals.

Summary

Low-latency database management means designing systems so data can be accessed and updated very quickly, ensuring that users experience minimal delays when interacting with applications. To reliably keep wait times low, it's important to combine smart data processing, careful storage strategies, and a thoughtful approach to how and when work is done.

  • Materialize computed values: Calculate derived or default values at the time you write data, not when you read it, so your database can serve requests in milliseconds instead of wasting time recalculating on every read.
  • Adopt hedged requests: When waiting on a data read, send a second request to a backup source after a brief delay and use whichever response comes first, reducing the risk of slow responses for your users.
  • Defer non-urgent work: Break down your system so only the most essential tasks run in real-time, pushing things like logging or extra checks to background processes to keep your fast path as lean as possible.
Summarized by AI based on LinkedIn member posts
  • View profile for Yan Cui

    Independent Consultant | AWS Serverless Hero

    49,785 followers

    Every senior engineer should know this concurrency pattern from Google for improving tail latency. While most engineers focus on average latency, experienced engineers know it's all about tail latency (e.g. p95, p99) because they measure actual user experience and these outliers ruin users' experience with your app. A great pattern for minimizing tail latency is "hedged requests", made famous at Google by Jeff Dean and Luiz André Barroso in their paper, "The Tail at Scale" (https://lnkd.in/eBmdVNNM) The idea is simple: 1️⃣ Send your request to the primary target. 2️⃣ If there’s no reply after a short delay, send it to a second target. 3️⃣ Whichever responds first wins, and discard the other. It works better than a naive fallback because: 1️⃣ It handles slow-but-not-failing cases better by allowing more time for the primary request to complete. 2️⃣ Sequential fallback adds delay. Hedging allows work to overlap, in short, MAX(primary, hedge) < SUM(primary, fallback). Nowadays, you can easily implement this with Rx's (Reactive Extensions) raceWith operator. For example: const primary = ... // fetch data from primary target const hedge = of(null).pipe( delay(100), // wait 100ms switchMap(() => ... ) // and then send the hedge request ) return primary .pipe(raceWith(hedge)) .subscribe({ next: result => ... // whichever request responds use, handle it error: err => ... }) This classic pattern raced servers with replicas, but with serverless, the machines are abstracted away. But it can still work in some practical ways. For example.. Multi-region active/active read endpoints. Call the nearest region as the primary, after a small delay, call the other and accept the first 2xx response. This is great for negating cold starts, noisy neighbours, or transient regional problems. Read from DynamoDB Global Tables. Try local region first, hedge to a replica after a delay. Read from primary/backup data sources. Hedge across two third-party providers (assuming cost is comparable). Useful when you can’t change the upstream but can choose where to get the data from. I have focused on reads here because it's simpler. But the same pattern can work for writes too, although, it requires idempotency control to ensure side-effects are not duplicated. The slowest 1% of requests is what users remember. Yes, hedging costs a few extra calls. But it buys back user's time at the edge of your SLO and is well worth the trade-off.

  • View profile for Raul Junco

    Simplifying System Design

    137,020 followers

    If you can, do the work when you write. Low Latency loves precomputation. Here’s the situation Maya faced 👇 Her user profiles table had optional fields: display_name, timezone, and bio. Half the rows were NULL. Reads had to stay under 20 ms at 10 k QPS. She needed to make sure every service saw a consistent “effective” value; no NULLs, ever. These were her choices: A → COALESCE at read time Keep columns nullable. Use COALESCE(column, default) in every query. Simple. Until you realize every query now computes on the hot path. Inconsistent logic across services. Unindexable. Slow. B → Materialize effective_ columns at write time* Compute once during writes or via CDC. Store effective_display_name, effective_timezone, etc. Reads stay fast. Defaults stay consistent. A little more write work, but predictable, cacheable, and observable. C → DB defaults + NOT NULL migration Feels clean. Declarative. But changing a live table with millions of rows is a minefield. Defaults only fix new rows; not legacy ones. Complex default rules don’t belong in SQL. D → Let every consumer handle defaults No schema change. But each service redefines what “default” means. Soon, you have five versions of truth, and none match. Maya picked B. Because at scale, reads dominate writes. Every millisecond saved per query compounds. Computing once at write-time beats recomputing 10,000 times a second. Trade-offs she accepted: – Slightly higher write latency. – Need for idempotency and concurrency control. – CDC lag if async updates are used. – Backfill job to fix existing data. Anyone can chase performance. Few can make it reliable. Never forget that: Latency loves precomputation. There are a couple of options "E" you can suggest, also, any idea?

  • View profile for Benjamin Cane

    Distinguished Engineer @ American Express | Slaying Latency & Building Reliable Card Payment Platforms since 2011

    4,838 followers

    When building low-latency, high-scale systems, a key strategy of mine is simple: “Push as much processing as possible to later.” Why It Matters? 🤔 In many systems—checkout, login, trade execution—latency matters because someone (or something) is waiting: - A customer at a point of sale - A user at a login screen - A system waiting on a transaction confirmation Platforms that support these scenarios must respond in milliseconds. If not, requests will fail, and user experiences will suffer. My Approach 🧠 I typically divide these platforms into two sub-platforms to optimize for speed and scale. 🏎️ Real-Time Platform: Optimized for scale and speed, only performing what is essential before responding to the request. 📥 Event-Driven Platform (sometimes Batch): Handles processing deferred from the real-time platform. It is still built for scale, but seconds, not milliseconds. Deciding What Belongs Where 🗃 I try to break down processing into steps, and for each step I ask: “Does this step need to happen before we respond or after?” ✅ If it MUST be performed before the response, use a real-time path. ⏭ If it can wait until after, event-driven path. Things that tend to follow the event-driven path are: - Audit logging - Downstream asynchronous notifications - Enrichment and Transformations - Checks that trigger out-of-band tasks These are not slow but don’t need to be “blocking.” Final Thoughts ✍️ The key message is that the more you do on the real-time path, the slower it is. This pattern is a good way to reduce the real-time workload. But the trick is to find a reliable and fast way to move work from a real-time to an event-driven system. Pub/Sub and gRPC streams are two of my go-to options. What is your favorite way to connect real-time and event-driven platforms? #Bengineering 🧐

  • View profile for Yasith Wimukthi

    Software Engineer at IFS | MSc in Big Data Analytics (Reading)| Full Stack Developer | Java Developer | Blogger | Tech Enthusiast

    14,412 followers

    අපි අද බලමු system එකක් design කරනකොට latency එක අඩු වෙන විදියට design කරන්නෙ කොහොමද කියලා. Latency එක ඇති වෙන්නෙ සරලවම devices අතර data වලට travel කරන්න තියෙන distance එක නිසා. ඒ වගේම මේ data වලට network කීපයක් හරහා යන්න උනොත් routers මගින් processing time එකකුත් add කරනවා. latency එක අඩු කරන්න තියෙන ක්රම : 1. Optimize Network Requests : Network request ප්රමාණය සහ ඒ request වල payload එකේ size එක අඩු කරලා data send and receive වෙන්න යන කාලය අඩු කරන්න පුලුවන්. ඒ වගේම request කීයපයක් එක request එකක් විදියට bundle කිරීම, Compressed JSON or Binary formats වගේ optimized request format use කිරීම වගේ දේවල් වලින් network overhead එක අඩු කරලා data transmit වෙන්න යන time එක අඩු කරන්න පුලුවන්. 2. Implement Caching Strategies : Caching වලදි frequently access වන data, user ට ළඟින් හෝ edge servers, reverse proxies, CDN nodes වගේ intermediate layers වල store කරන්න පුලුවන්. මේකෙන් data fetch වෙන්න යන time එක අඩු වෙනවා. 3. Fine-Tune Database Performance : මේකට database query optimize කිරීම, indexes use කිරීම සහ තව ගොඩක් දේවල් කරන්න පුලුවන්. 4. Adopt Asynchronous Processing : Async processing වලින් අපිට අනෙක් operations block කරන්නෙ නැතුව task, background run කරන්න පුලුවන්. උදාහරණයක් විදියට file upload වගේ time consuming task එක කරනකොට අනෙක් process වලට interrupt වෙන්නෙ නැති විදියට file upload task එක background එකේ run වෙන්න දීලා අනෙක් processes continue කරන්න පුලුවන්. 5. Refine Application Code : Algorithms වල complexity එක අඩු කිරීම, redundant computations eliminate කිරීම වගේ දේවල් වලින් අපේ code එක optimize කරලා processing time එක අඩු කරන්න පුලුවන්. 6. Leverage In-Memory Data Stores : Redis, Memcached වගේ In-Memory data stores වලට server RAM එකේ data store කරන්න පුලුවන්. මේක disk-based storage එකකින් data retrieve කරනවට වඩා ගොඩක් speed. Frequently accessed data, computation heavy data වලට මේක හොදයි. 7. Implement Load Balancing : Incoming traffic එක servers කීපයක් අතරේ distribute කරන්න load balancer එකට පුලුවන්. ඒකෙන් එක server එකක් overwhelmed වෙලා response time එක increase වෙන එක වළක්වන්න පුලුවන්. 8. Optimize Data Serialization : Data serialization කියන්නෙ data structures හෝ objects, network එකක් හරහා transmit කරන්න පුලුවන් format එකකට convent කරන එක. Protobuf / Protocol Buffers, MessagePack වගේ efficient serialization formats වලින් අපිට XML සහ JSON වලින් එන latency එක අඩු කරගන්න පුලුවන්. 9. Utilize Hardware Acceleration : GPU, FPGA, ASIC වගේ specialized hardware use කරලා CPU එකෙන් specific computational task offload කරන්න පුලුවන්. උදාහරණයක් විදියට complex encryption, machine learning task වගේ දේවල් වලටය dedicated hardware use කරන්න පුලුවන්. 10. Apply Predictive Prefetching : මේකෙදි වෙන්නෙ user ගෙ behaviours or patterns use කරලා future request වලට ඕන data load කරන එක. උදාහරණයක් විදියට user කෙනෙක් ecommerce site එකක browse කරනකොට user ඊලගට බලන්න පුලුවන් කියලා හිතෙන items කීපයක details fetch කරන්න පුලුවන්. මේ post එකට අදාල වෙනත් post ටිකක ලින්ක් මම comment section එකේ දාලා තියනවා. ඒවත් බලන්න. #SystemDesign #Latency #Optimization

  • View profile for Nina Fernanda Durán

    AI Architect · Ship AI to production, here’s how

    58,512 followers

    5 Key Database Caching Strategies Developers Should Know 1/ Cache-Aside → First, the cache is checked. If data is missing, it's fetched from the database and cached for future use. Great for read-heavy apps with infrequent data updates. 2/ Read-Through → The cache seamlessly fetches missing data from the database, ensuring availability. Perfect for high-traffic scenarios to minimize latency. 3/ Write-Around → Data is written to the database, bypassing the cache. Works well when fresh data isn’t critical for immediate reads, reducing cache usage. 4/ Write-Through → Data updates both cache and database simultaneously, ensuring consistency. Ideal when reliability is more important than write speed. 5/ Write-Behind → Data is written to the cache first and synced to the database later. Best for write-heavy applications where speed matters, and eventual consistency suffices. 𝗖𝗵𝗼𝗼𝘀𝗶𝗻𝗴 𝘁𝗵𝗲 𝗥𝗶𝗴𝗵𝘁 𝗦𝘁𝗿𝗮𝘁𝗲𝗴𝘆 𝗳𝗼𝗿 𝗬𝗼𝘂𝗿 𝗔𝗽𝗽𝗹𝗶𝗰𝗮𝘁𝗶𝗼𝗻 The best caching strategy depends on your application’s unique requirements: • Read-Heavy Workloads: If your app prioritizes quick reads, consider Cache-Aside or Read-Through for optimal performance. These approaches minimize database calls while ensuring commonly accessed data is readily available. • Write-Heavy Workloads: For applications with frequent writes, Write-Behind can offer better performance by reducing the immediate burden on the database. However, it’s important to evaluate the trade-off with data consistency. • Consistency vs. Performance: In scenarios where data consistency is paramount (e.g., financial systems), Write-Through ensures synchronization at the cost of slower writes. On the other hand, Write-Around may be better for applications that don’t require instant data caching. 𝗕𝗲𝘀𝘁 𝗣𝗿𝗮𝗰𝘁𝗶𝗰𝗲𝘀 𝗳𝗼𝗿 𝗜𝗺𝗽𝗹𝗲𝗺𝗲𝗻𝘁𝗶𝗻𝗴 𝗖𝗮𝗰𝗵𝗲 𝗦𝘁𝗿𝗮𝘁𝗲𝗴𝗶𝗲𝘀 • Set Expiration Policies: Avoid stale data by configuring TTL (Time-To-Live) for your cache. This keeps data fresh and reduces the risk of serving outdated information. • Monitor Cache Performance: Regularly track hit/miss ratios and optimize cache size and configuration to align with application demands. • Leverage Hybrid Approaches: Combine strategies where needed. For instance, use Write-Through for critical data and Write-Behind for less time-sensitive operations. 𝗛𝗼𝘄 𝗱𝗼 𝘆𝗼𝘂 𝗯𝗮𝗹𝗮��𝗰𝗲 𝗽𝗲𝗿𝗳𝗼𝗿𝗺𝗮𝗻𝗰𝗲 𝗮𝗻𝗱 𝗰𝗼𝗻𝘀𝗶𝘀𝘁𝗲𝗻𝗰𝘆 𝗶𝗻 𝘆𝗼𝘂𝗿 𝗰𝗮𝗰𝗵𝗶𝗻𝗴 𝘀𝗲𝘁𝘂𝗽? ���� Visualizing Software Engineering concepts through easy-to-understand Sketech. I'm Nina, software engineer & project manager. Sketech Newsletter now has a LinkedIn Page. Join me! ❤️ #database #caching #devs #technology

  • View profile for Fahim ul Haq

    Co-Founder & CEO at Educative | Software Engineer

    24,835 followers

    7 ways to slash latency in System Design: When I worked on Meta's distributed data store, managing massive data wasn't the most challenging part – fighting latency was. After years of trial and error, I'm sharing 7 proven best practices for building low-latency systems. 1. Choose the correct architecture Swap slower monolithic architectures for modular, scalable systems that cut latency by operating independently. 2. Optimize data management Choose SQL for structured data or NoSQL for flexibility, and use indexing, sharding, and replication to speed up data retrieval. 3. Improve network design Reduce data travel with CDNs, load balancers, and edge caching. 4. Use efficient communication protocols Use fast protocols like HTTP/2, WebSockets, and gRPC to streamline communication and cut round-trip times. 5. Optimize your code Use efficient algorithms, data structures, parallel processing, and I/O reduction to speed up execution and reduce bottlenecks. 6. Use high-performance hardware Opt for SSDs over HDDs and low-latency cloud services like AWS Global Accelerator for faster performance. 7. Implement smart caching strategies Cache at multiple layers with LRU/LFU eviction and sync updates to avoid stale data. Read the full blog for a closer look at these 7 practices and how to implement them. (Your system—and users—will thank you). https://educat.tv/3B5Dh0N #SystemDesign #SoftwareEngineer

  • 🚀 The 9 Golden Rules of Data Ingestion (v2.0) Rule 1 — Batch First, Streaming Second Don’t default to real-time. Unless the business requires sub-minute latency, avoid the complexity of state management and streaming backfills. "Right-time" > "Real-time." Rule 2 — The "Reconciliation" Loop Streaming pipelines are prone to drift. Always schedule a daily Batch Sync (Lambda Architecture) to catch late-arriving data and correct any stream inaccuracies. Rule 3 — Isolate Analytical Loads Never touch the Primary DB. Always ingest from Read Replicas. If the volume is massive, use log-based CDC (Change Data Capture) to minimize impact on the source system. Rule 4 — Respect the "Consistency Gap" (T-2 Logic) Data sources are rarely instantly consistent. Use a T-2 (Time minus 2) offset—loading data from 2 hours/days ago—to allow for replication lag and transaction settlement. Rule 5 — Bronze Layer Immunity (Ingest as String) Protect your extraction layer. Ingest raw data into the Bronze layer as Strings (or JSON/Parquet). Push strict type enforcement to the Silver layer to prevent fragile pipeline failures on source schema drift. Rule 6 — Parameterized Backfills Hard-coded dates are the enemy. Design pipelines to accept dynamic start_date and end_date parameters so you can effortlessly replay historical data without code changes. Rule 7 — Decouple Compute & Storage Use ephemeral compute for ingestion. The combo of Airflow (Orchestration) + Spark on Serverless (EMR/Databricks) allows you to scale up for heavy loads and scale to zero to save costs. Rule 8 — Observability > Hope Don't just assume success. Implement Automated Data Quality (DQ) checks at the end of day. If Source_Count != Target_Count (within a threshold), block downstream models. Rule 9 — Strict Idempotency A pipeline run multiple times must yield the same result. Use MERGE (Upsert) or OVERWRITE PARTITION logic. Never APPEND blindly.

  • View profile for Alexandre Zajac

    SDE & AI @Amazon | Building Hungry Minds to 1M+ | Daily Posts on Software Engineering, System Design, and AI ⚡

    154,489 followers

    Uber's database handles 40M reads/s. 4.1 ms at p99.9 latency explained: Uber's Docstore is a massive, distributed database essential for handling the company's data needs, processing more than 30 million requests per second. As demand grew, the need for low-latency, high-throughput solutions became critical. Traditional disk-based storage, even with optimizations like NVMe SSDs, faced limitations in scalability, cost, and latency. To address these challenges, Uber developed CacheFront, an integrated caching solution designed to reduce latency, improve scalability, and lower costs without compromising on data consistency or developer productivity. The Key Challenges: 0. Latency and scalability: Disk-based databases have inherent latency and scalability limits. 1. Cost: Scaling up or horizontally adds significant costs. 2. Operational complexity: Managing increased partitions and ensuring data durability is complex. 3. Request imbalance: High-read request volumes can overwhelm storage nodes. CacheFront's solution had to implement cached reads, with disk fallback if necessary. It also had to handle cache invalidation with a data capture system and remain adaptable per database, table, or request basis. Implementation Highlights: 0. Incremental build: Started with the most common query patterns for caching. 1. High-level architecture: Separates caching from storage, allowing independent scaling. 2. Cache invalidation: Utilizes change data capture to maintain consistency. 3. Cache warming and sharding: Ensures high availability and fault tolerance across geographical regions. 4. Circuit breakers and adaptive timeouts: Enhances system resilience and optimizes latency. Results and Impact: 0. P75 latency decreased by 75%, and P99.9 latency by over 67%. 1. Achieved a 99% cache hit rate for some of the largest use cases, significantly reducing the load on the storage engine. 2. Reduced the need for approximately 60K CPU cores to just 3K Redis cores for certain use cases. 3. Supports over 40 million requests per second across all instances, with proven success in failover scenarios. Actionable Learnings: 0. Integrated caching: Implementing an integrated caching layer can dramatically improve database read performance while reducing costs and operational complexity. 1. Cache invalidation: A robust cache invalidation strategy is crucial for maintaining data consistency, especially in systems requiring high throughput and low-latency reads. 2. Adaptability and scalability: Systems should be designed to adapt to varying workloads and scale independently across different components to ensure reliability and performance. What do you think about it? #softwareengineering #systemdesign #programming

Explore categories