⚡️ Speed up function `_sanitize_list` by 44% in PR #10820 (`cz/add-logs-feature`) #11171

codeflash-ai · 2025-12-30T19:01:27Z

⚡️ This pull request contains optimizations for PR #10820

If you approve this dependent PR, these changes will be merged into the original PR branch cz/add-logs-feature.

This PR will be automatically closed if the original PR is merged.

📄 44% (0.44x) speedup for `_sanitize_list` in `src/backend/base/langflow/services/database/models/transactions/model.py`

⏱️ Runtime : 3.65 milliseconds → 2.53 milliseconds (best of 77 runs)

📝 Explanation and details

The optimized code achieves a 44% speedup by introducing a cached version of the _is_sensitive_key() function using @lru_cache(maxsize=512).

Key optimization:

The critical change is wrapping _is_sensitive_key() with an LRU cache through the new _is_sensitive_key_cached() function. The line profiler data reveals the impact:

Original code: Line calling _is_sensitive_key(key) took 11.4ms (33.8% of _sanitize_dict time)
Optimized code: Line calling _is_sensitive_key_cached(key) took 2.9ms (11.7% of _sanitize_dict time)

This represents a ~74% reduction in time spent on sensitivity checking within _sanitize_dict.

Why this works:

The _is_sensitive_key() function performs:

String lowercasing (key.lower())
Frozenset lookup in SENSITIVE_KEY_NAMES
Regex pattern matching via SENSITIVE_KEYS_PATTERN.match()

These operations, especially regex matching, are computationally expensive. In typical usage, dictionaries often have repeated keys across multiple records (e.g., "api_key", "password", "username"). The cache with maxsize=512 stores previously computed results, converting O(n) regex operations into O(1) dictionary lookups for repeated keys.

Performance characteristics:

The test results show consistent speedups across all test cases, particularly:

Tests with repeated keys benefit most (e.g., test_large_list_of_dicts with 500 identical structures)
Tests with nested structures see compounding benefits as the cache warms up during recursive calls
Even single-pass tests benefit since keys like "api_key", "password", "token" appear in SENSITIVE_KEY_NAMES, meaning the cache hit happens immediately on second occurrence

The maxsize=512 is well-sized for typical workloads—most applications have far fewer than 512 unique key names, ensuring high cache hit rates without excessive memory overhead.

Workload impact:

Without function_references, the specific call context is unclear. However, given this is in a database transaction model for sanitizing logs/data, this optimization is particularly valuable for:

High-throughput logging scenarios where the same dict structures are sanitized repeatedly
Batch processing of database records with consistent schemas
API request/response sanitization where field names are predictable and repetitive

✅ Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘 None Found
�� Generated Regression Tests	✅ 77 Passed
⏪ Replay Tests	🔘 None Found
🔎 Concolic Coverage Tests	🔘 None Found
📊 Tests Coverage	100.0%

🌀 Click to see Generated Regression Tests

import re
from typing import Any

# imports
import pytest  # used for our unit tests
from langflow.services.database.models.transactions.model import _sanitize_list

# unit tests

# ----------- BASIC TEST CASES -----------

def test_empty_list_returns_empty():
    """Sanitize an empty list should return an empty list."""
    codeflash_output = _sanitize_list([])

def test_list_of_non_sensitive_scalars():
    """Sanitize a list of non-sensitive scalar values."""
    data = [1, "hello", 3.14, None, True]
    codeflash_output = _sanitize_list(data)

def test_list_of_dicts_with_no_sensitive_keys():
    """Sanitize a list of dicts with no sensitive keys."""
    data = [{"foo": "bar"}, {"baz": 123}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_short_value():
    """Sensitive key with value shorter than MIN_LENGTH_FOR_PARTIAL_MASK is fully redacted."""
    data = [{"api_key": "shortkey"}]
    expected = [{"api_key": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_long_value():
    """Sensitive key with value longer than MIN_LENGTH_FOR_PARTIAL_MASK is partially masked."""
    value = "A" * 16  # 16 chars
    data = [{"api_key": value}]
    expected = [{"api_key": "AAAA...AAAA"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_empty_string():
    """Sensitive key with empty string value is fully redacted."""
    data = [{"password": ""}]
    expected = [{"password": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_non_string_value():
    """Sensitive key with non-string value is fully redacted."""
    data = [{"token": None}, {"secret": 12345}]
    expected = [{"token": "***REDACTED***"}, {"secret": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_mixed_sensitive_and_non_sensitive_keys():
    """Sanitize dicts in list with both sensitive and non-sensitive keys."""
    data = [
        {"api_key": "123456789012", "foo": "bar"},
        {"password": "letmein", "baz": 42},
    ]
    expected = [
        {"api_key": "***REDACTED***", "foo": "bar"},
        {"password": "***REDACTED***", "baz": 42},
    ]
    codeflash_output = _sanitize_list(data)

def test_nested_dicts_in_list():
    """Sanitize nested dicts within list."""
    data = [
        {
            "user": {
                "name": "Alice",
                "token": "abcdef1234567890"
            }
        }
    ]
    expected = [
        {
            "user": {
                "name": "Alice",
                "token": "abcd...7890"
            }
        }
    ]
    codeflash_output = _sanitize_list(data)

def test_nested_lists_in_list():
    """Sanitize nested lists within list."""
    data = [
        [
            {"api_key": "A" * 20},
            {"foo": "bar"}
        ],
        "baz"
    ]
    expected = [
        [
            {"api_key": "AAAA...AAAA"},
            {"foo": "bar"}
        ],
        "baz"
    ]
    codeflash_output = _sanitize_list(data)

# ----------- EDGE TEST CASES -----------

def test_sensitive_key_case_insensitivity():
    """Sensitive key matching should be case-insensitive."""
    data = [{"API_KEY": "abcdef1234567890"}]
    expected = [{"API_KEY": "abcd...7890"}]
    codeflash_output = _sanitize_list(data)

def test_sensitive_key_with_suffix():
    """Sensitive key pattern matches keys with sensitive suffixes."""
    data = [{"my_api_key": "abcdef1234567890"}]
    expected = [{"my_api_key": "abcd...7890"}]
    codeflash_output = _sanitize_list(data)

def test_sensitive_key_with_dash_suffix():
    """Sensitive key pattern matches keys ending with dash and sensitive word."""
    data = [{"user-token": "abcdef1234567890"}]
    expected = [{"user-token": "abcd...7890"}]
    codeflash_output = _sanitize_list(data)

def test_sensitive_key_with_mixed_case_and_underscore():
    """Sensitive key matching with mixed case and underscore."""
    data = [{"Access_Token": "abcdef1234567890"}]
    expected = [{"Access_Token": "abcd...7890"}]
    codeflash_output = _sanitize_list(data)

def test_sensitive_key_with_non_string_value_edge():
    """Sensitive key with unusual non-string values (list, dict)."""
    data = [{"secret": ["should", "be", "redacted"]}, {"token": {"inner": "value"}}]
    expected = [{"secret": "***REDACTED***"}, {"token": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_excluded_key_is_removed():
    """Key in EXCLUDED_KEYS should be removed from output."""
    data = [{"code": "print('hello')", "foo": "bar"}]
    expected = [{"foo": "bar"}]
    codeflash_output = _sanitize_list(data)

def test_dict_with_only_excluded_key():
    """Dict with only excluded key should become empty dict."""
    data = [{"code": "something"}]
    expected = [{}]
    codeflash_output = _sanitize_list(data)

def test_list_with_empty_dict():
    """Empty dict in list should remain unchanged."""
    data = [{}]
    expected = [{}]
    codeflash_output = _sanitize_list(data)

def test_list_with_empty_list():
    """Empty list in list should remain unchanged."""
    data = [[]]
    expected = [[]]
    codeflash_output = _sanitize_list(data)

def test_list_with_none_value():
    """None value in list should remain unchanged."""
    data = [None]
    expected = [None]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_none_value():
    """Sensitive key with None value should be redacted."""
    data = [{"password": None}]
    expected = [{"password": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_false_value():
    """Sensitive key with False value should be redacted."""
    data = [{"token": False}]
    expected = [{"token": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_empty_list_value():
    """Sensitive key with empty list value should be redacted."""
    data = [{"secret": []}]
    expected = [{"secret": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_empty_dict_value():
    """Sensitive key with empty dict value should be redacted."""
    data = [{"secret": {}}]
    expected = [{"secret": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_integer_zero():
    """Sensitive key with integer zero value should be redacted."""
    data = [{"token": 0}]
    expected = [{"token": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_float_zero():
    """Sensitive key with float zero value should be redacted."""
    data = [{"token": 0.0}]
    expected = [{"token": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_boolean_true():
    """Sensitive key with boolean True value should be redacted."""
    data = [{"token": True}]
    expected = [{"token": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_boolean_false():
    """Sensitive key with boolean False value should be redacted."""
    data = [{"token": False}]
    expected = [{"token": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_empty_string():
    """Sensitive key with empty string value should be redacted."""
    data = [{"token": ""}]
    expected = [{"token": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_whitespace_string():
    """Sensitive key with whitespace string value, less than min length, should be redacted."""
    data = [{"token": "   "}]
    expected = [{"token": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_whitespace_long_string():
    """Sensitive key with whitespace string value, longer than min length, should be partially masked."""
    value = " " * 16
    data = [{"token": value}]
    expected = [{"token": "    ...    "}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_unicode_value():
    """Sensitive key with unicode string value should be masked correctly."""
    value = "秘密のパスワード12345678"
    data = [{"password": value}]
    # Length > 12, so partial mask
    expected = [{"password": f"{value[:4]}...{value[-4:]}"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_bytes_value():
    """Sensitive key with bytes value should be redacted."""
    data = [{"token": b"mytoken"}]
    expected = [{"token": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_list_of_strings():
    """Sensitive key with list value should be redacted (not recursed)."""
    data = [{"token": ["a", "b", "c"]}]
    expected = [{"token": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_dict_value():
    """Sensitive key with dict value should be redacted (not recursed)."""
    data = [{"token": {"inner": "value"}}]
    expected = [{"token": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_tuple_value():
    """Sensitive key with tuple value should be redacted."""
    data = [{"token": ("a", "b")}]
    expected = [{"token": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_set_value():
    """Sensitive key with set value should be redacted."""
    data = [{"token": {"a", "b"}}]
    expected = [{"token": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_falsy_value():
    """Sensitive key with falsy value (empty string, 0, False, None) should be redacted."""
    data = [{"token": ""}, {"token": 0}, {"token": False}, {"token": None}]
    expected = [{"token": "***REDACTED***"}, {"token": "***REDACTED***"}, {"token": "***REDACTED***"}, {"token": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_truthy_value():
    """Sensitive key with truthy value (non-empty string, non-zero number, True) should be masked/redacted appropriately."""
    data = [{"token": "a" * 16}, {"token": 1}, {"token": True}]
    expected = [{"token": "aaaa...aaaa"}, {"token": "***REDACTED***"}, {"token": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

# ----------- LARGE SCALE TEST CASES -----------

def test_large_list_of_dicts_with_sensitive_keys():
    """Sanitize a large list of dicts with sensitive keys."""
    N = 500
    data = [{"api_key": f"key{i:04d}abcdefghij"} for i in range(N)]
    expected = [{"api_key": f"key{i:04d}...ghij"} for i in range(N)]
    codeflash_output = _sanitize_list(data)

def test_large_list_of_mixed_dicts():
    """Sanitize a large list of dicts with mixed sensitive and non-sensitive keys."""
    N = 500
    data = [
        {"api_key": f"key{i:04d}abcdefghij", "foo": f"bar{i}", "baz": i}
        for i in range(N)
    ]
    expected = [
        {"api_key": f"key{i:04d}...ghij", "foo": f"bar{i}", "baz": i}
        for i in range(N)
    ]
    codeflash_output = _sanitize_list(data)

def test_large_list_of_lists():
    """Sanitize a large list of lists containing dicts with sensitive keys."""
    N = 100
    data = [
        [
            {"password": f"pass{i:04d}abcdefghij"},
            {"token": f"tok{i:04d}abcdefghij"},
            {"foo": f"bar{i}"}
        ]
        for i in range(N)
    ]
    expected = [
        [
            {"password": f"pass{i:04d}...ghij"},
            {"token": f"tok{i:04d}...ghij"},
            {"foo": f"bar{i}"}
        ]
        for i in range(N)
    ]
    codeflash_output = _sanitize_list(data)

def test_deeply_nested_large_structure():
    """Sanitize a deeply nested structure with sensitive keys."""
    N = 50
    data = [
        [
            [
                {"api_key": f"key{i:04d}abcdefghij", "nested": {"token": f"tok{i:04d}abcdefghij"}}
                for i in range(N)
            ]
        ]
    ]
    expected = [
        [
            [
                {"api_key": f"key{i:04d}...ghij", "nested": {"token": f"tok{i:04d}...ghij"}}
                for i in range(N)
            ]
        ]
    ]
    codeflash_output = _sanitize_list(data)

def test_large_list_with_excluded_key():
    """Sanitize a large list of dicts with excluded keys."""
    N = 300
    data = [{"code": f"print({i})", "foo": f"bar{i}"} for i in range(N)]
    expected = [{"foo": f"bar{i}"} for i in range(N)]
    codeflash_output = _sanitize_list(data)

def test_large_list_of_scalars():
    """Sanitize a large list of scalar values (should remain unchanged)."""
    N = 1000
    data = [i for i in range(N)]
    codeflash_output = _sanitize_list(data)

def test_large_list_of_empty_dicts():
    """Sanitize a large list of empty dicts (should remain unchanged)."""
    N = 500
    data = [{} for _ in range(N)]
    codeflash_output = _sanitize_list(data)

def test_large_list_of_empty_lists():
    """Sanitize a large list of empty lists (should remain unchanged)."""
    N = 500
    data = [[] for _ in range(N)]
    codeflash_output = _sanitize_list(data)

def test_large_list_with_various_types():
    """Sanitize a large list with mixed types, including dicts, lists, scalars."""
    N = 200
    data = []
    expected = []
    for i in range(N):
        if i % 4 == 0:
            data.append({"api_key": f"key{i:04d}abcdefghij"})
            expected.append({"api_key": f"key{i:04d}...ghij"})
        elif i % 4 == 1:
            data.append([{"token": f"tok{i:04d}abcdefghij"}])
            expected.append([{"token": f"tok{i:04d}...ghij"}])
        elif i % 4 == 2:
            data.append(i)
            expected.append(i)
        else:
            data.append(None)
            expected.append(None)
    codeflash_output = _sanitize_list(data)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.
#------------------------------------------------
import re
from typing import Any

# imports
import pytest  # used for our unit tests
from langflow.services.database.models.transactions.model import _sanitize_list

# unit tests

# -------------------- BASIC TEST CASES --------------------

def test_empty_list_returns_empty():
    # Test that an empty list returns an empty list
    codeflash_output = _sanitize_list([])

def test_list_of_ints_and_strings():
    # Non-dict, non-list items should be unchanged
    data = [1, "hello", 3.14, None]
    codeflash_output = _sanitize_list(data)

def test_list_of_simple_dicts():
    # Dicts with non-sensitive keys should be unchanged
    data = [{"name": "Alice"}, {"age": 30}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_dict_key_short_value():
    # Sensitive key with short value should be fully redacted
    data = [{"api_key": "123456"}]
    expected = [{"api_key": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_dict_key_long_value():
    # Sensitive key with long value should be partially masked
    value = "A" * 20
    data = [{"api_key": value}]
    expected = [{"api_key": f"{value[:4]}...{value[-4:]}"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_non_string_value():
    # Sensitive key with non-string value should be fully redacted
    data = [{"password": 123456}]
    expected = [{"password": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_empty_string():
    # Sensitive key with empty string should be fully redacted
    data = [{"token": ""}]
    expected = [{"token": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_excluded_key():
    # Dict with excluded key should drop that key
    data = [{"code": "print('hi')", "name": "Alice"}]
    expected = [{"name": "Alice"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_pattern_match():
    # Key that matches the sensitive pattern should be masked
    data = [{"my_api_key": "abcdef123456"}]
    expected = [{"my_api_key": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_mixed_sensitive_and_non_sensitive_keys():
    # Only sensitive keys should be masked
    data = [{"api_key": "123456789012", "username": "bob"}]
    expected = [{"api_key": "***REDACTED***", "username": "bob"}]
    codeflash_output = _sanitize_list(data)

# -------------------- EDGE TEST CASES --------------------

def test_list_with_nested_dicts():
    # Nested dicts should be sanitized recursively
    data = [{"user": {"api_key": "abcdef123456", "name": "Alice"}}]
    expected = [{"user": {"api_key": "***REDACTED***", "name": "Alice"}}]
    codeflash_output = _sanitize_list(data)

def test_list_with_nested_lists():
    # Nested lists should be sanitized recursively
    data = [[{"api_key": "abcdef123456"}, {"name": "Alice"}]]
    expected = [[{"api_key": "***REDACTED***"}, {"name": "Alice"}]]
    codeflash_output = _sanitize_list(data)

def test_list_with_deeply_nested_structures():
    # Deeply nested dicts and lists should be sanitized at all levels
    data = [
        {
            "users": [
                {"api_key": "abcdef123456", "profile": {"password": "mypassword"}},
                {"name": "Bob", "token": "tok1234567890"},
            ]
        }
    ]
    expected = [
        {
            "users": [
                {"api_key": "***REDACTED***", "profile": {"password": "***REDACTED***"}},
                {"name": "Bob", "token": "***REDACTED***"},
            ]
        }
    ]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_case_insensitivity():
    # Sensitive key matching should be case-insensitive
    data = [{"API_KEY": "abcdef123456"}]
    expected = [{"API_KEY": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_pattern_variants():
    # Sensitive key pattern should match variants
    data = [
        {"my-api-key": "abcdef123456"},
        {"my_password": "secretpass"},
        {"authToken": "tokenvalue"},
        {"bearer_token": "bearertoken"},
        {"private-key": "privatekeyval"},
        {"access_key": "accesskeyval"},
    ]
    expected = [
        {"my-api-key": "***REDACTED***"},
        {"my_password": "***REDACTED***"},
        {"authToken": "***REDACTED***"},
        {"bearer_token": "***REDACTED***"},
        {"private-key": "***REDACTED***"},
        {"access_key": "***REDACTED***"},
    ]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_none_value():
    # Sensitive key with None value should be redacted
    data = [{"api_key": None}]
    expected = [{"api_key": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_bool_value():
    # Sensitive key with boolean value should be redacted
    data = [{"password": True}]
    expected = [{"password": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)


def test_list_with_dicts_having_only_excluded_keys():
    # Dicts with only excluded keys should become empty dicts
    data = [{"code": "print('hi')"}, {"code": "foo"}]
    expected = [{}, {}]
    codeflash_output = _sanitize_list(data)

def test_list_with_empty_dicts_and_lists():
    # Empty dicts and lists should be preserved
    data = [{} , []]
    expected = [{} , []]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_empty_dict_value():
    # Sensitive key with empty dict value should be redacted
    data = [{"api_key": {}}]
    expected = [{"api_key": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

def test_list_with_sensitive_key_and_list_value():
    # Sensitive key with list value should be redacted
    data = [{"api_key": [1,2,3]}]
    expected = [{"api_key": "***REDACTED***"}]
    codeflash_output = _sanitize_list(data)

# -------------------- LARGE SCALE TEST CASES --------------------

def test_large_list_of_dicts():
    # Large list of dicts with sensitive and non-sensitive keys
    data = [{"api_key": f"key{i}", "name": f"user{i}"} for i in range(500)]
    expected = [{"api_key": "***REDACTED***", "name": f"user{i}"} for i in range(500)]
    codeflash_output = _sanitize_list(data)

def test_large_nested_list():
    # Large nested list structure
    data = [[{"password": f"pass{i}", "age": i} for i in range(100)] for _ in range(5)]
    expected = [[{"password": "***REDACTED***", "age": i} for i in range(100)] for _ in range(5)]
    codeflash_output = _sanitize_list(data)

def test_large_list_with_various_types():
    # Large list with mixed types, including dicts, lists, and primitives
    data = []
    for i in range(300):
        if i % 3 == 0:
            data.append({"token": f"tok{i}", "value": i})
        elif i % 3 == 1:
            data.append([{"api_key": f"key{i}"}])
        else:
            data.append(i)
    expected = []
    for i in range(300):
        if i % 3 == 0:
            expected.append({"token": "***REDACTED***", "value": i})
        elif i % 3 == 1:
            expected.append([{"api_key": "***REDACTED***"}])
        else:
            expected.append(i)
    codeflash_output = _sanitize_list(data)

def test_large_list_of_empty_dicts_and_lists():
    # Large list of empty dicts and lists
    data = [{} for _ in range(200)] + [[] for _ in range(200)]
    expected = [{} for _ in range(200)] + [[] for _ in range(200)]
    codeflash_output = _sanitize_list(data)

def test_large_list_with_excluded_keys():
    # Large list with dicts containing excluded keys
    data = [{"code": f"print({i})", "api_key": f"key{i}"} for i in range(400)]
    expected = [{"api_key": "***REDACTED***"} for i in range(400)]
    codeflash_output = _sanitize_list(data)

def test_large_list_with_long_sensitive_values():
    # Large list with sensitive values longer than MIN_LENGTH_FOR_PARTIAL_MASK
    long_val = "X" * 30
    data = [{"api_key": long_val, "other": i} for i in range(100)]
    expected = [{"api_key": f"{long_val[:4]}...{long_val[-4:]}", "other": i} for i in range(100)]
    codeflash_output = _sanitize_list(data)

# -------------------- DETERMINISM / ORDER PRESERVATION --------------------

def test_list_order_is_preserved():
    # The sanitized list should preserve the original order
    data = [
        {"api_key": "key1"},
        {"name": "Alice"},
        [1, 2, 3],
        {"password": "pass"},
        {"code": "should be removed"},
        {"token": "tok"},
    ]
    expected = [
        {"api_key": "***REDACTED***"},
        {"name": "Alice"},
        [1, 2, 3],
        {"password": "***REDACTED***"},
        {},
        {"token": "***REDACTED***"},
    ]
    codeflash_output = _sanitize_list(data)

def test_list_with_duplicate_sensitive_keys():
    # Multiple dicts with the same sensitive key should all be masked
    data = [{"api_key": "key1"}, {"api_key": "key2"}, {"api_key": "key3"}]
    expected = [{"api_key": "***REDACTED***"} for _ in range(3)]
    codeflash_output = _sanitize_list(data)
# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changes git checkout codeflash/optimize-pr10820-2025-12-30T19.01.21 and push.

The optimized code achieves a **44% speedup** by introducing a cached version of the `_is_sensitive_key()` function using `@lru_cache(maxsize=512)`. **Key optimization:** The critical change is wrapping `_is_sensitive_key()` with an LRU cache through the new `_is_sensitive_key_cached()` function. The line profiler data reveals the impact: - **Original code**: Line calling `_is_sensitive_key(key)` took **11.4ms** (33.8% of `_sanitize_dict` time) - **Optimized code**: Line calling `_is_sensitive_key_cached(key)` took **2.9ms** (11.7% of `_sanitize_dict` time) This represents a **~74% reduction** in time spent on sensitivity checking within `_sanitize_dict`. **Why this works:** The `_is_sensitive_key()` function performs: 1. String lowercasing (`key.lower()`) 2. Frozenset lookup in `SENSITIVE_KEY_NAMES` 3. Regex pattern matching via `SENSITIVE_KEYS_PATTERN.match()` These operations, especially regex matching, are computationally expensive. In typical usage, dictionaries often have repeated keys across multiple records (e.g., "api_key", "password", "username"). The cache with `maxsize=512` stores previously computed results, converting O(n) regex operations into O(1) dictionary lookups for repeated keys. **Performance characteristics:** The test results show consistent speedups across all test cases, particularly: - Tests with **repeated keys** benefit most (e.g., `test_large_list_of_dicts` with 500 identical structures) - Tests with **nested structures** see compounding benefits as the cache warms up during recursive calls - Even single-pass tests benefit since keys like "api_key", "password", "token" appear in `SENSITIVE_KEY_NAMES`, meaning the cache hit happens immediately on second occurrence The `maxsize=512` is well-sized for typical workloads—most applications have far fewer than 512 unique key names, ensuring high cache hit rates without excessive memory overhead. **Workload impact:** Without `function_references`, the specific call context is unclear. However, given this is in a database transaction model for sanitizing logs/data, this optimization is particularly valuable for: - High-throughput logging scenarios where the same dict structures are sanitized repeatedly - Batch processing of database records with consistent schemas - API request/response sanitization where field names are predictable and repetitive

coderabbitai · 2025-12-30T19:01:35Z

Important

Review skipped

Bot user detected.

To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2025-12-30T19:05:17Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 33.36%. Comparing base (79c1e5c) to head (0c58205).

Additional details and impacted files

@@                   Coverage Diff                   @@
##           cz/add-logs-feature   #11171      +/-   ##
=======================================================
+ Coverage                33.34%   33.36%   +0.02%     
=======================================================
  Files                     1399     1399              
  Lines                    66226    66230       +4     
  Branches                  9785     9785              
=======================================================
+ Hits                     22080    22095      +15     
+ Misses                   43021    43011      -10     
+ Partials                  1125     1124       -1

Flag	Coverage Δ
backend	`52.86% <100.00%> (+0.06%)`	⬆️
lfx	`39.50% <ø> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...low/services/database/models/transactions/model.py	`92.72% <100.00%> (+0.27%)`	⬆️

... and 5 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

codeflash-ai bot added the ⚡️ codeflash Optimization PR opened by Codeflash AI label Dec 30, 2025

github-actions bot added the community Pull Request from an external contributor label Dec 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up function `_sanitize_list` by 44% in PR #10820 (`cz/add-logs-feature`) #11171

⚡️ Speed up function `_sanitize_list` by 44% in PR #10820 (`cz/add-logs-feature`) #11171

Uh oh!

codeflash-ai bot commented Dec 30, 2025

coderabbitai bot commented Dec 30, 2025

Review skipped

codecov bot commented Dec 30, 2025 •

edited

Loading

Labels

1 participant

⚡️ Speed up function _sanitize_list by 44% in PR #10820 (cz/add-logs-feature) #11171

Are you sure you want to change the base?

⚡️ Speed up function _sanitize_list by 44% in PR #10820 (cz/add-logs-feature) #11171

Uh oh!

Conversation

codeflash-ai bot commented Dec 30, 2025

⚡️ This pull request contains optimizations for PR #10820

📄 44% (0.44x) speedup for _sanitize_list in src/backend/base/langflow/services/database/models/transactions/model.py

📝 Explanation and details

coderabbitai bot commented Dec 30, 2025

Review skipped

codecov bot commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Labels

1 participant

⚡️ Speed up function `_sanitize_list` by 44% in PR #10820 (`cz/add-logs-feature`) #11171

⚡️ Speed up function `_sanitize_list` by 44% in PR #10820 (`cz/add-logs-feature`) #11171

📄 44% (0.44x) speedup for `_sanitize_list` in `src/backend/base/langflow/services/database/models/transactions/model.py`

codecov bot commented Dec 30, 2025 •

edited

Loading