Skip to content

Conversation

@codeflash-ai
Copy link
Contributor

@codeflash-ai codeflash-ai bot commented Dec 30, 2025

⚡️ This pull request contains optimizations for PR #10820

If you approve this dependent PR, these changes will be merged into the original PR branch cz/add-logs-feature.

This PR will be automatically closed if the original PR is merged.


📄 44% (0.44x) speedup for _sanitize_list in src/backend/base/langflow/services/database/models/transactions/model.py

⏱️ Runtime : 3.65 milliseconds 2.53 milliseconds (best of 77 runs)

📝 Explanation and details

The optimized code achieves a 44% speedup by introducing a cached version of the _is_sensitive_key() function using @lru_cache(maxsize=512).

Key optimization:

The critical change is wrapping _is_sensitive_key() with an LRU cache through the new _is_sensitive_key_cached() function. The line profiler data reveals the impact:

  • Original code: Line calling _is_sensitive_key(key) took 11.4ms (33.8% of _sanitize_dict time)
  • Optimized code: Line calling _is_sensitive_key_cached(key) took 2.9ms (11.7% of _sanitize_dict time)

This represents a ~74% reduction in time spent on sensitivity checking within _sanitize_dict.

Why this works:

The _is_sensitive_key() function performs:

  1. String lowercasing (key.lower())
  2. Frozenset lookup in SENSITIVE_KEY_NAMES
  3. Regex pattern matching via SENSITIVE_KEYS_PATTERN.match()

These operations, especially regex matching, are computationally expensive. In typical usage, dictionaries often have repeated keys across multiple records (e.g., "api_key", "password", "username"). The cache with maxsize=512 stores previously computed results, converting O(n) regex operations into O(1) dictionary lookups for repeated keys.

Performance characteristics:

The test results show consistent speedups across all test cases, particularly:

  • Tests with repeated keys benefit most (e.g., test_large_list_of_dicts with 500 identical structures)
  • Tests with nested structures see compounding benefits as the cache warms up during recursive calls
  • Even single-pass tests benefit since keys like "api_key", "password", "token" appear in SENSITIVE_KEY_NAMES, meaning the cache hit happens immediately on second occurrence

The maxsize=512 is well-sized for typical workloads—most applications have far fewer than 512 unique key names, ensuring high cache hit rates without excessive memory overhead.

Workload impact:

Without function_references, the specific call context is unclear. However, given this is in a database transaction model for sanitizing logs/data, this optimization is particularly valuable for:

  • High-throughput logging scenarios where the same dict structures are sanitized repeatedly
  • Batch processing of database records with consistent schemas
  • API request/response sanitization where field names are predictable and repetitive

Correctness verification report:

Test Status
⚙️ Existing Unit Tests 🔘 None Found
�� Generated Regression Tests 77 Passed
⏪ Replay Tests 🔘 None Found
🔎 Concolic Coverage Tests 🔘 None Found
📊 Tests Coverage 100.0%
🌀 Click to see Generated Regression Tests
import re
from typing import Any

# imports
import pytest  # used for our unit tests
from langflow.services.database.models.transactions.model import _sanitize_list

# unit tests

# ----------- BASIC TEST CASES -----------

def test_empty_list_returns_empty():
    """Sanitize an empty list should return an empty list."""
    codeflash_output = _sanitize_list([])

def test_list_of_non_sensitive_scalars():
    """Sanitize a list of non-sensitive scalar values."""
    data = [1, "hello", 3.14, None, True]
    codeflash_output = _sanitize_list(data)

def test_list_of_dicts_with_no_sensitive_keys():
    """Sanitize a list of dicts with no sensitive keys."""
    data = [{"foo": "bar"}, {"baz": 123}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_short_value():
    """Sensitive key with value shorter than MIN_LENGTH_FOR_PARTIAL_MASK is fully redacted."""
    data = [{"api_key": "shortkey"}]
    expected = [{"api_key": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_long_value():
    """Sensitive key with value longer than MIN_LENGTH_FOR_PARTIAL_MASK is partially masked."""
    value = "A" * 16  # 16 chars
    data = [{"api_key": value}]
    expected = [{"api_key": "AAAA...AAAA"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_empty_string():
    """Sensitive key with empty string value is fully redacted."""
    data = [{"password": ""}]
    expected = [{"password": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_non_string_value():
    """Sensitive key with non-string value is fully redacted."""
    data = [{"token": None}, {"secret": 12345}]
    expected = [{"token": "***REDACTED***"}, {"secret": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_mixed_sensitive_and_non_sensitive_keys():
    """Sanitize dicts in list with both sensitive and non-sensitive keys."""
    data = [
        {"api_key": "123456789012", "foo": "bar"},
        {"password": "letmein", "baz": 42},
    ]
    expected = [
        {"api_key": "***REDACTED***", "foo": "bar"},
        {"password": "***REDACTED***", "baz": 42},
    ]
    codeflash_output = _sanitize_list(data)

def test_nested_dicts_in_list():
    """Sanitize nested dicts within list."""
    data = [
        {
            "user": {
                "name": "Alice",
                "token": "abcdef1234567890"
            }
        }
    ]
    expected = [
        {
            "user": {
                "name": "Alice",
                "token": "abcd...7890"
            }
        }
    ]
    codeflash_output = _sanitize_list(data)

def test_nested_lists_in_list():
    """Sanitize nested lists within list."""
    data = [
        [
            {"api_key": "A" * 20},
            {"foo": "bar"}
        ],
        "baz"
    ]
    expected = [
        [
            {"api_key": "AAAA...AAAA"},
            {"foo": "bar"}
        ],
        "baz"
    ]
    codeflash_output = _sanitize_list(data)

# ----------- EDGE TEST CASES -----------

def test_sensitive_key_case_insensitivity():
    """Sensitive key matching should be case-insensitive."""
    data = [{"API_KEY": "abcdef1234567890"}]
    expected = [{"API_KEY": "abcd...7890"}]
    codeflash_output = _sanitize_list(data)

def test_sensitive_key_with_suffix():
    """Sensitive key pattern matches keys with sensitive suffixes."""
    data = [{"my_api_key": "abcdef1234567890"}]
    expected = [{"my_api_key": "abcd...7890"}]
    codeflash_output = _sanitize_list(data)

def test_sensitive_key_with_dash_suffix():
    """Sensitive key pattern matches keys ending with dash and sensitive word."""
    data = [{"user-token": "abcdef1234567890"}]
    expected = [{"user-token": "abcd...7890"}]
    codeflash_output = _sanitize_list(data)

def test_sensitive_key_with_mixed_case_and_underscore():
    """Sensitive key matching with mixed case and underscore."""
    data = [{"Access_Token": "abcdef1234567890"}]
    expected = [{"Access_Token": "abcd...7890"}]
    codeflash_output = _sanitize_list(data)

def test_sensitive_key_with_non_string_value_edge():
    """Sensitive key with unusual non-string values (list, dict)."""
    data = [{"secret": ["should", "be", "redacted"]}, {"token": {"inner": "value"}}]
    expected = [{"secret": "***REDACTED***"}, {"token": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_excluded_key_is_removed():
    """Key in EXCLUDED_KEYS should be removed from output."""
    data = [{"code": "print('hello')", "foo": "bar"}]
    expected = [{"foo": "bar"}]
    codeflash_output = _sanitize_list(data)

def test_dict_with_only_excluded_key():
    """Dict with only excluded key should become empty dict."""
    data = [{"code": "something"}]
    expected = [{}]
    codeflash_output = _sanitize_list(data)

def test_list_with_empty_dict():
    """Empty dict in list should remain unchanged."""
    data = [{}]
    expected = [{}]
    codeflash_output = _sanitize_list(data)

def test_list_with_empty_list():
    """Empty list in list should remain unchanged."""
    data = [[]]
    expected = [[]]
    codeflash_output = _sanitize_list(data)

def test_list_with_none_value():
    """None value in list should remain unchanged."""
    data = [None]
    expected = [None]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_none_value():
    """Sensitive key with None value should be redacted."""
    data = [{"password": None}]
    expected = [{"password": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_false_value():
    """Sensitive key with False value should be redacted."""
    data = [{"token": False}]
    expected = [{"token": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_empty_list_value():
    """Sensitive key with empty list value should be redacted."""
    data = [{"secret": []}]
    expected = [{"secret": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_empty_dict_value():
    """Sensitive key with empty dict value should be redacted."""
    data = [{"secret": {}}]
    expected = [{"secret": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_integer_zero():
    """Sensitive key with integer zero value should be redacted."""
    data = [{"token": 0}]
    expected = [{"token": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_float_zero():
    """Sensitive key with float zero value should be redacted."""
    data = [{"token": 0.0}]
    expected = [{"token": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_boolean_true():
    """Sensitive key with boolean True value should be redacted."""
    data = [{"token": True}]
    expected = [{"token": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_boolean_false():
    """Sensitive key with boolean False value should be redacted."""
    data = [{"token": False}]
    expected = [{"token": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_empty_string():
    """Sensitive key with empty string value should be redacted."""
    data = [{"token": ""}]
    expected = [{"token": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_whitespace_string():
    """Sensitive key with whitespace string value, less than min length, should be redacted."""
    data = [{"token": "   "}]
    expected = [{"token": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_whitespace_long_string():
    """Sensitive key with whitespace string value, longer than min length, should be partially masked."""
    value = " " * 16
    data = [{"token": value}]
    expected = [{"token": "    ...    "}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_unicode_value():
    """Sensitive key with unicode string value should be masked correctly."""
    value = "秘密のパスワード12345678"
    data = [{"password": value}]
    # Length > 12, so partial mask
    expected = [{"password": f"{value[:4]}...{value[-4:]}"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_bytes_value():
    """Sensitive key with bytes value should be redacted."""
    data = [{"token": b"mytoken"}]
    expected = [{"token": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_list_of_strings():
    """Sensitive key with list value should be redacted (not recursed)."""
    data = [{"token": ["a", "b", "c"]}]
    expected = [{"token": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_dict_value():
    """Sensitive key with dict value should be redacted (not recursed)."""
    data = [{"token": {"inner": "value"}}]
    expected = [{"token": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_tuple_value():
    """Sensitive key with tuple value should be redacted."""
    data = [{"token": ("a", "b")}]
    expected = [{"token": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_set_value():
    """Sensitive key with set value should be redacted."""
    data = [{"token": {"a", "b"}}]
    expected = [{"token": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_falsy_value():
    """Sensitive key with falsy value (empty string, 0, False, None) should be redacted."""
    data = [{"token": ""}, {"token": 0}, {"token": False}, {"token": None}]
    expected = [{"token": "***REDACTED***"}, {"token": "***REDACTED***"}, {"token": "***REDACTED***"}, {"token": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_truthy_value():
    """Sensitive key with truthy value (non-empty string, non-zero number, True) should be masked/redacted appropriately."""
    data = [{"token": "a" * 16}, {"token": 1}, {"token": True}]
    expected = [{"token": "aaaa...aaaa"}, {"token": "***REDACTED***"}, {"token": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

# ----------- LARGE SCALE TEST CASES -----------

def test_large_list_of_dicts_with_sensitive_keys():
    """Sanitize a large list of dicts with sensitive keys."""
    N = 500
    data = [{"api_key": f"key{i:04d}abcdefghij"} for i in range(N)]
    expected = [{"api_key": f"key{i:04d}...ghij"} for i in range(N)]
    codeflash_output = _sanitize_list(data)

def test_large_list_of_mixed_dicts():
    """Sanitize a large list of dicts with mixed sensitive and non-sensitive keys."""
    N = 500
    data = [
        {"api_key": f"key{i:04d}abcdefghij", "foo": f"bar{i}", "baz": i}
        for i in range(N)
    ]
    expected = [
        {"api_key": f"key{i:04d}...ghij", "foo": f"bar{i}", "baz": i}
        for i in range(N)
    ]
    codeflash_output = _sanitize_list(data)

def test_large_list_of_lists():
    """Sanitize a large list of lists containing dicts with sensitive keys."""
    N = 100
    data = [
        [
            {"password": f"pass{i:04d}abcdefghij"},
            {"token": f"tok{i:04d}abcdefghij"},
            {"foo": f"bar{i}"}
        ]
        for i in range(N)
    ]
    expected = [
        [
            {"password": f"pass{i:04d}...ghij"},
            {"token": f"tok{i:04d}...ghij"},
            {"foo": f"bar{i}"}
        ]
        for i in range(N)
    ]
    codeflash_output = _sanitize_list(data)

def test_deeply_nested_large_structure():
    """Sanitize a deeply nested structure with sensitive keys."""
    N = 50
    data = [
        [
            [
                {"api_key": f"key{i:04d}abcdefghij", "nested": {"token": f"tok{i:04d}abcdefghij"}}
                for i in range(N)
            ]
        ]
    ]
    expected = [
        [
            [
                {"api_key": f"key{i:04d}...ghij", "nested": {"token": f"tok{i:04d}...ghij"}}
                for i in range(N)
            ]
        ]
    ]
    codeflash_output = _sanitize_list(data)

def test_large_list_with_excluded_key():
    """Sanitize a large list of dicts with excluded keys."""
    N = 300
    data = [{"code": f"print({i})", "foo": f"bar{i}"} for i in range(N)]
    expected = [{"foo": f"bar{i}"} for i in range(N)]
    codeflash_output = _sanitize_list(data)

def test_large_list_of_scalars():
    """Sanitize a large list of scalar values (should remain unchanged)."""
    N = 1000
    data = [i for i in range(N)]
    codeflash_output = _sanitize_list(data)

def test_large_list_of_empty_dicts():
    """Sanitize a large list of empty dicts (should remain unchanged)."""
    N = 500
    data = [{} for _ in range(N)]
    codeflash_output = _sanitize_list(data)

def test_large_list_of_empty_lists():
    """Sanitize a large list of empty lists (should remain unchanged)."""
    N = 500
    data = [[] for _ in range(N)]
    codeflash_output = _sanitize_list(data)

def test_large_list_with_various_types():
    """Sanitize a large list with mixed types, including dicts, lists, scalars."""
    N = 200
    data = []
    expected = []
    for i in range(N):
        if i % 4 == 0:
            data.append({"api_key": f"key{i:04d}abcdefghij"})
            expected.append({"api_key": f"key{i:04d}...ghij"})
        elif i % 4 == 1:
            data.append([{"token": f"tok{i:04d}abcdefghij"}])
            expected.append([{"token": f"tok{i:04d}...ghij"}])
        elif i % 4 == 2:
            data.append(i)
            expected.append(i)
        else:
            data.append(None)
            expected.append(None)
    codeflash_output = _sanitize_list(data)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import re
from typing import Any

# imports
import pytest  # used for our unit tests
from langflow.services.database.models.transactions.model import _sanitize_list

# unit tests

# -------------------- BASIC TEST CASES --------------------

def test_empty_list_returns_empty():
    # Test that an empty list returns an empty list
    codeflash_output = _sanitize_list([])

def test_list_of_ints_and_strings():
    # Non-dict, non-list items should be unchanged
    data = [1, "hello", 3.14, None]
    codeflash_output = _sanitize_list(data)

def test_list_of_simple_dicts():
    # Dicts with non-sensitive keys should be unchanged
    data = [{"name": "Alice"}, {"age": 30}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_dict_key_short_value():
    # Sensitive key with short value should be fully redacted
    data = [{"api_key": "123456"}]
    expected = [{"api_key": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_dict_key_long_value():
    # Sensitive key with long value should be partially masked
    value = "A" * 20
    data = [{"api_key": value}]
    expected = [{"api_key": f"{value[:4]}...{value[-4:]}"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_non_string_value():
    # Sensitive key with non-string value should be fully redacted
    data = [{"password": 123456}]
    expected = [{"password": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_empty_string():
    # Sensitive key with empty string should be fully redacted
    data = [{"token": ""}]
    expected = [{"token": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_excluded_key():
    # Dict with excluded key should drop that key
    data = [{"code": "print('hi')", "name": "Alice"}]
    expected = [{"name": "Alice"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_pattern_match():
    # Key that matches the sensitive pattern should be masked
    data = [{"my_api_key": "abcdef123456"}]
    expected = [{"my_api_key": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_mixed_sensitive_and_non_sensitive_keys():
    # Only sensitive keys should be masked
    data = [{"api_key": "123456789012", "username": "bob"}]
    expected = [{"api_key": "***REDACTED***", "username": "bob"}]
    codeflash_output = _sanitize_list(data)

# -------------------- EDGE TEST CASES --------------------

def test_list_with_nested_dicts():
    # Nested dicts should be sanitized recursively
    data = [{"user": {"api_key": "abcdef123456", "name": "Alice"}}]
    expected = [{"user": {"api_key": "***REDACTED***", "name": "Alice"}}]
    codeflash_output = _sanitize_list(data)

def test_list_with_nested_lists():
    # Nested lists should be sanitized recursively
    data = [[{"api_key": "abcdef123456"}, {"name": "Alice"}]]
    expected = [[{"api_key": "***REDACTED***"}, {"name": "Alice"}]]
    codeflash_output = _sanitize_list(data)

def test_list_with_deeply_nested_structures():
    # Deeply nested dicts and lists should be sanitized at all levels
    data = [
        {
            "users": [
                {"api_key": "abcdef123456", "profile": {"password": "mypassword"}},
                {"name": "Bob", "token": "tok1234567890"},
            ]
        }
    ]
    expected = [
        {
            "users": [
                {"api_key": "***REDACTED***", "profile": {"password": "***REDACTED***"}},
                {"name": "Bob", "token": "***REDACTED***"},
            ]
        }
    ]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_case_insensitivity():
    # Sensitive key matching should be case-insensitive
    data = [{"API_KEY": "abcdef123456"}]
    expected = [{"API_KEY": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_pattern_variants():
    # Sensitive key pattern should match variants
    data = [
        {"my-api-key": "abcdef123456"},
        {"my_password": "secretpass"},
        {"authToken": "tokenvalue"},
        {"bearer_token": "bearertoken"},
        {"private-key": "privatekeyval"},
        {"access_key": "accesskeyval"},
    ]
    expected = [
        {"my-api-key": "***REDACTED***"},
        {"my_password": "***REDACTED***"},
        {"authToken": "***REDACTED***"},
        {"bearer_token": "***REDACTED***"},
        {"private-key": "***REDACTED***"},
        {"access_key": "***REDACTED***"},
    ]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_none_value():
    # Sensitive key with None value should be redacted
    data = [{"api_key": None}]
    expected = [{"api_key": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_bool_value():
    # Sensitive key with boolean value should be redacted
    data = [{"password": True}]
    expected = [{"password": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)


def test_list_with_dicts_having_only_excluded_keys():
    # Dicts with only excluded keys should become empty dicts
    data = [{"code": "print('hi')"}, {"code": "foo"}]
    expected = [{}, {}]
    codeflash_output = _sanitize_list(data)

def test_list_with_empty_dicts_and_lists():
    # Empty dicts and lists should be preserved
    data = [{} , []]
    expected = [{} , []]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_empty_dict_value():
    # Sensitive key with empty dict value should be redacted
    data = [{"api_key": {}}]
    expected = [{"api_key": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_list_value():
    # Sensitive key with list value should be redacted
    data = [{"api_key": [1,2,3]}]
    expected = [{"api_key": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

# -------------------- LARGE SCALE TEST CASES --------------------

def test_large_list_of_dicts():
    # Large list of dicts with sensitive and non-sensitive keys
    data = [{"api_key": f"key{i}", "name": f"user{i}"} for i in range(500)]
    expected = [{"api_key": "***REDACTED***", "name": f"user{i}"} for i in range(500)]
    codeflash_output = _sanitize_list(data)

def test_large_nested_list():
    # Large nested list structure
    data = [[{"password": f"pass{i}", "age": i} for i in range(100)] for _ in range(5)]
    expected = [[{"password": "***REDACTED***", "age": i} for i in range(100)] for _ in range(5)]
    codeflash_output = _sanitize_list(data)

def test_large_list_with_various_types():
    # Large list with mixed types, including dicts, lists, and primitives
    data = []
    for i in range(300):
        if i % 3 == 0:
            data.append({"token": f"tok{i}", "value": i})
        elif i % 3 == 1:
            data.append([{"api_key": f"key{i}"}])
        else:
            data.append(i)
    expected = []
    for i in range(300):
        if i % 3 == 0:
            expected.append({"token": "***REDACTED***", "value": i})
        elif i % 3 == 1:
            expected.append([{"api_key": "***REDACTED***"}])
        else:
            expected.append(i)
    codeflash_output = _sanitize_list(data)

def test_large_list_of_empty_dicts_and_lists():
    # Large list of empty dicts and lists
    data = [{} for _ in range(200)] + [[] for _ in range(200)]
    expected = [{} for _ in range(200)] + [[] for _ in range(200)]
    codeflash_output = _sanitize_list(data)

def test_large_list_with_excluded_keys():
    # Large list with dicts containing excluded keys
    data = [{"code": f"print({i})", "api_key": f"key{i}"} for i in range(400)]
    expected = [{"api_key": "***REDACTED***"} for i in range(400)]
    codeflash_output = _sanitize_list(data)

def test_large_list_with_long_sensitive_values():
    # Large list with sensitive values longer than MIN_LENGTH_FOR_PARTIAL_MASK
    long_val = "X" * 30
    data = [{"api_key": long_val, "other": i} for i in range(100)]
    expected = [{"api_key": f"{long_val[:4]}...{long_val[-4:]}", "other": i} for i in range(100)]
    codeflash_output = _sanitize_list(data)

# -------------------- DETERMINISM / ORDER PRESERVATION --------------------

def test_list_order_is_preserved():
    # The sanitized list should preserve the original order
    data = [
        {"api_key": "key1"},
        {"name": "Alice"},
        [1, 2, 3],
        {"password": "pass"},
        {"code": "should be removed"},
        {"token": "tok"},
    ]
    expected = [
        {"api_key": "***REDACTED***"},
        {"name": "Alice"},
        [1, 2, 3],
        {"password": "***REDACTED***"},
        {},
        {"token": "***REDACTED***"},
    ]
    codeflash_output = _sanitize_list(data)

def test_list_with_duplicate_sensitive_keys():
    # Multiple dicts with the same sensitive key should all be masked
    data = [{"api_key": "key1"}, {"api_key": "key2"}, {"api_key": "key3"}]
    expected = [{"api_key": "***REDACTED***"} for _ in range(3)]
    codeflash_output = _sanitize_list(data)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr10820-2025-12-30T19.01.21 and push.

Codeflash

The optimized code achieves a **44% speedup** by introducing a cached version of the `_is_sensitive_key()` function using `@lru_cache(maxsize=512)`. 

**Key optimization:**

The critical change is wrapping `_is_sensitive_key()` with an LRU cache through the new `_is_sensitive_key_cached()` function. The line profiler data reveals the impact:

- **Original code**: Line calling `_is_sensitive_key(key)` took **11.4ms** (33.8% of `_sanitize_dict` time)
- **Optimized code**: Line calling `_is_sensitive_key_cached(key)` took **2.9ms** (11.7% of `_sanitize_dict` time)

This represents a **~74% reduction** in time spent on sensitivity checking within `_sanitize_dict`.

**Why this works:**

The `_is_sensitive_key()` function performs:
1. String lowercasing (`key.lower()`)
2. Frozenset lookup in `SENSITIVE_KEY_NAMES`
3. Regex pattern matching via `SENSITIVE_KEYS_PATTERN.match()`

These operations, especially regex matching, are computationally expensive. In typical usage, dictionaries often have repeated keys across multiple records (e.g., "api_key", "password", "username"). The cache with `maxsize=512` stores previously computed results, converting O(n) regex operations into O(1) dictionary lookups for repeated keys.

**Performance characteristics:**

The test results show consistent speedups across all test cases, particularly:
- Tests with **repeated keys** benefit most (e.g., `test_large_list_of_dicts` with 500 identical structures)
- Tests with **nested structures** see compounding benefits as the cache warms up during recursive calls
- Even single-pass tests benefit since keys like "api_key", "password", "token" appear in `SENSITIVE_KEY_NAMES`, meaning the cache hit happens immediately on second occurrence

The `maxsize=512` is well-sized for typical workloads—most applications have far fewer than 512 unique key names, ensuring high cache hit rates without excessive memory overhead.

**Workload impact:**

Without `function_references`, the specific call context is unclear. However, given this is in a database transaction model for sanitizing logs/data, this optimization is particularly valuable for:
- High-throughput logging scenarios where the same dict structures are sanitized repeatedly
- Batch processing of database records with consistent schemas
- API request/response sanitization where field names are predictable and repetitive
@codeflash-ai codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Dec 30, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 30, 2025

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot added the community Pull Request from an external contributor label Dec 30, 2025
@codecov
Copy link

codecov bot commented Dec 30, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 33.36%. Comparing base (79c1e5c) to head (0c58205).

Additional details and impacted files

Impacted file tree graph

@@                   Coverage Diff                   @@
##           cz/add-logs-feature   #11171      +/-   ##
=======================================================
+ Coverage                33.34%   33.36%   +0.02%     
=======================================================
  Files                     1399     1399              
  Lines                    66226    66230       +4     
  Branches                  9785     9785              
=======================================================
+ Hits                     22080    22095      +15     
+ Misses                   43021    43011      -10     
+ Partials                  1125     1124       -1     
Flag Coverage Δ
backend 52.86% <100.00%> (+0.06%) ⬆️
lfx 39.50% <ø> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...low/services/database/models/transactions/model.py 92.72% <100.00%> (+0.27%) ⬆️

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⚡️ codeflash Optimization PR opened by Codeflash AI community Pull Request from an external contributor

1 participant