Skip to content

Conversation

@haacked
Copy link
Contributor

@haacked haacked commented Oct 31, 2025

This PR targets the haacked/redis-client-ztsd branch as it needs those changes to be merged first.

Problem

Feature flags has a lot of places where it reads from cache, and in case of cache miss, writes to cache. But it isn't consistent in how it does it and the current approach has some problems such as trying to write when Redis is timing out.

We need a consistent approach for caching that:

  1. Handles Redis degradation gracefully - When Redis is under load and returns timeouts/network errors, we shouldn't treat these the same as cache misses or attempt writes that add to the load
  2. Supports good observability - Distinguish between different cache scenarios (hit, miss, corruption, Redis unavailable) for better metrics and debugging
  3. Prevents cache poisoning - Handle corrupted cache entries by replacing them with fresh data
  4. Reduces data source load - Implement negative caching for non-existent keys to prevent repeated queries to databases, APIs, or other data sources

Currently, our cache handling is scattered across services with inconsistent error handling and limited observability.

Solution: ReadThroughCache in common-cache

A new generic read-through cache abstraction that encapsulates cache patterns and provides rich metadata for observability. The cache works with any async loader function (databases, HTTP APIs, file systems, computations, etc.), making it reusable across all PostHog services. I separated this out into its own thing to make it easier to review.

Key Components

pub enum CacheSource {
    // Value found scenarios
    PositiveCache,                      // Redis hit (valid data)
    NegativeCache,                      // In-memory hit (key doesn't exist)
    LoaderCacheMiss,                    // Loaded from source (Redis miss)
    LoaderCacheCorrupted,               // Loaded from source (Redis had bad data)
    LoaderRedisUnavailable,             // Loaded from source (Redis timeout/error)

    // Not found scenarios
    LoaderNotFoundCacheMiss,            // Not found in source or Redis
    LoaderNotFoundCacheCorrupted,       // Not found in source (Redis had bad data)
    LoaderNotFoundRedisUnavailable,     // Not found in source (Redis timeout/error)
}

pub struct CacheResult<V> {
    pub value: Option<V>,
    pub source: CacheSource,
}

impl<V> CacheResult<V> {
    pub fn was_cached(&self) -> bool;
    pub fn invoked_loader(&self) -> bool;
    pub fn had_cache_problem(&self) -> bool;
}

Usage Examples

Basic Usage

use common_cache::{CacheConfig, ReadThroughCache, CacheResult, CacheSource};
use std::sync::Arc;

// 1. Create cache configuration
let cache_config = CacheConfig::new(
    "posthog:1:team_token:",  // Redis key prefix
    Some(3600)                // TTL in seconds
);

// 2. Create cache instance
let cache = Arc::new(ReadThroughCache::new(
    redis_reader,
    redis_writer,
    cache_config,
    None,  // Optional negative cache
));

// 3. Use with get_or_load
let result: CacheResult<Team> = cache
    .get_or_load(&token, |token| async move {
        match Team::from_pg(pg_client, token).await {
            Ok(team) => Ok(Some(team)),           // Found
            Err(FlagError::RowNotFound) => Ok(None),  // Not found (triggers negative cache)
            Err(e) => Err(e),                     // Database error
        }
    })
    .await?;

// 4. Use CacheSource for observability
match result.source {
    CacheSource::PositiveCache => {
        // Cache hit with valid data
        tracing::debug!("Team loaded from Redis cache");
    }
    CacheSource::LoaderCacheMiss => {
        // Cache miss, loaded from source (database in this example)
        metrics::inc("db_queries");
    }
    CacheSource::LoaderCacheCorrupted => {
        // Redis had corrupted data, replaced with fresh data from source
        tracing::warn!("Corrupted cache entry replaced");
        metrics::inc("cache_corruption");
    }
    CacheSource::LoaderRedisUnavailable => {
        // Redis timeout/error, loaded from source
        tracing::warn!("Redis unavailable, using database");
        metrics::inc("redis_unavailable");
    }
    // ... handle other sources
}

// 5. Use the value
if let Some(team) = result.value {
    // Use team
}

With Negative Caching

use common_cache::NegativeCache;

// Create negative cache with LRU and TTL
let negative_cache = Arc::new(NegativeCache::new(
    1000,  // Max 1000 entries
    300    // 5 minute TTL
));

let cache = Arc::new(ReadThroughCache::new(
    redis_reader,
    redis_writer,
    cache_config,
    Some(negative_cache),  // Enable negative caching
));

// When loader returns Ok(None), key is added to negative cache
let result = cache.get_or_load(&invalid_token, |token| async move {
    match Team::from_pg(pg_client, token).await {
        Ok(team) => Ok(Some(team)),
        Err(FlagError::RowNotFound) => Ok(None),  // Cached in negative cache
        Err(e) => Err(e),
    }
}).await?;

// Subsequent lookups with same invalid_token return instantly
// result.source == CacheSource::NegativeCache

CacheResult Helper Methods

The CacheResult provides convenience methods for common checks:

let result = cache.get_or_load(&key, loader).await?;

// Check if we hit the cache (Redis or negative cache)
if result.was_cached() {
    metrics::inc("cache_hits");
} else {
    metrics::inc("cache_misses");
}

// Check if we had to invoke the loader function
if result.invoked_loader() {
    tracing::debug!("Loader was invoked for key: {}", key);
    metrics::inc("loader_invocations");
}

// Check if there were cache infrastructure problems
if result.had_cache_problem() {
    tracing::warn!("Cache problem detected");
    metrics::inc("cache_problems");
    // This includes: corruption or Redis unavailability
}

Cache Write Behavior

The cache writes to Redis on true cache miss or on parse errors.

Scenario Redis Read Loader Invoked Redis Write Negative Cache
Redis hit (valid) Success No Skip Skip
Redis miss NotFound Yes If found If not found
Redis corrupted ParseError Yes If found If not found
Redis timeout Timeout Yes Skip Skip
Redis unavailable Network error Yes Skip Skip

Important

The goal: when Redis is struggling, we don't add to its load with writes.

Implementation Details

  • Error handling: Distinguishes between infrastructure issues (timeout, network) vs data issues (not found, corrupted)
  • Negative caching: In-memory Moka cache with LRU eviction and TTL (doesn't poison Redis)
  • Type safety: Generic CacheResult<V> works with any serializable type
  • Testability: Rich CacheSource enum makes it easy to verify behavior in tests

How did you test this code?

cargo test -p common-cache    # Cache library tests

Next Steps

Future PRs will migrate feature-flags service to use ReadThroughCache for:

  • Team token validation
  • Feature flag lookups
  • Team secret token validation

This infrastructure can also be adopted by other services (capture, data-stack, etc.) for consistent caching behavior across PostHog.

@haacked haacked requested a review from Copilot October 31, 2025 16:51
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a new generic read-through cache crate (common-cache) for PostHog services with Redis-based positive caching and optional in-memory negative caching. It also fixes a lifetime elision issue in the MockRedisClient.

  • Adds a comprehensive read-through cache implementation with Redis and Moka-based negative caching
  • Provides rich observability through CacheSource enum to track cache hits, misses, and infrastructure issues
  • Includes a minor fix to explicitly specify the lifetime parameter in MockRedisClient's lock_calls method

Reviewed Changes

Copilot reviewed 7 out of 8 changed files in this pull request and generated no comments.

Show a summary per file
File Description
rust/common/redis/src/lib.rs Fixes lifetime elision by explicitly adding '_ to MutexGuard return type
rust/common/cache/src/read_through.rs Core read-through cache implementation with Redis positive caching and fallback logic
rust/common/cache/src/negative_cache.rs In-memory negative cache using Moka for preventing repeated DB queries for missing keys
rust/common/cache/src/lib.rs Module exports and crate-level documentation
rust/common/cache/src/config.rs Cache configuration and result types with helper methods for observability
rust/common/cache/Cargo.toml Package definition for the new common-cache crate
rust/Cargo.toml Adds common-cache to workspace members
rust/Cargo.lock Lock file updates for new dependencies

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@haacked haacked force-pushed the haacked/read-through-cache branch 2 times, most recently from a14c2d2 to 9cbb68a Compare October 31, 2025 17:45
@haacked haacked force-pushed the haacked/flags-redis-ttl branch from 452143a to d0277d4 Compare October 31, 2025 22:20
Base automatically changed from haacked/flags-redis-ttl to master October 31, 2025 22:29
@haacked haacked force-pushed the haacked/read-through-cache branch from 9cbb68a to 81f1181 Compare October 31, 2025 23:43
@haacked haacked changed the base branch from master to haacked/redis-client-ztsd October 31, 2025 23:44
We have a lot of places where we attempt to read a Redis cache entry and if the entry doesn't exist, we fetch the data from the database and then write the entry to a writer `RedisClient`.

The new `ReadThroughCache` class encapsulates and improves on that pattern.

For example, our existing code:

1. Treats `TimeoutError` and other errors as cache misses, causing us to attempt to write to the cache when it might be under load.
2. Doesn't evict existing entries when the DB fetch produces no results.
3. Doesn't distinguish between corrupt data and missing data.
4. Doesn't cache negative entries. So if a project api key doesn't exist, requests with it always hit the db.

The `ReadThroughCache` (combined with `NegativeCache`) provides the structure to fix all these issues.
@haacked haacked force-pushed the haacked/read-through-cache branch from 81f1181 to ceddacc Compare October 31, 2025 23:46
@haacked haacked marked this pull request as ready for review October 31, 2025 23:47
@haacked haacked requested review from a team and oliverb123 October 31, 2025 23:49
@posthog-project-board-bot posthog-project-board-bot bot moved this to In Review in Feature Flags Oct 31, 2025
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants