⚡️ Speed up function sanitize_data by 94% in PR #10820 (cz/add-logs-feature)
#11169
+9
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
⚡️ This pull request contains optimizations for PR #10820
If you approve this dependent PR, these changes will be merged into the original PR branch
cz/add-logs-feature.📄 94% (0.94x) speedup for
sanitize_datainsrc/backend/base/langflow/services/database/models/transactions/model.py⏱️ Runtime :
2.36 milliseconds→1.21 milliseconds(best of105runs)📝 Explanation and details
The optimized code achieves a 94% speedup (from 2.36ms to 1.21ms) by introducing a regex pattern cache that eliminates redundant regex searches on dictionary keys.
What Changed:
_pattern_cachedictionary to store regex match results keyed by the string being searched_sanitize_dictto check the cache before performing the expensiveSENSITIVE_KEYS_PATTERN.search(key)operationWhy This Works:
The line profiler reveals that
SENSITIVE_KEYS_PATTERN.search(key)in the original code consumed 20.7% of total execution time (3.06ms out of 14.8ms). Regex operations are computationally expensive, involving pattern compilation, backtracking, and string matching. In typical workloads with repeated key names (e.g., "password", "api_key", "username" appearing across multiple dictionaries), the same regex search is performed repeatedly.The cache converts this O(n) regex operation into an O(1) dictionary lookup after the first occurrence. With 3,884 key checks in the profiled run and only 1,758 unique keys, the optimization saves 2,126 redundant regex searches (approximately 55% hit rate).
Performance Characteristics:
test_large_nested_dictwith 100 identical user objects, ortest_large_performancewith 1000 entries using the same "api_key_*" pattern). These scenarios maximize cache hits.Test Results:
The annotated tests confirm correctness is preserved across all edge cases (empty dicts, nested structures, non-dict inputs, excluded keys) while showing consistent performance gains, particularly in large-scale scenarios where key repetition is common.
✅ Correctness verification report:
⚙️ Click to see Existing Unit Tests
🌀 Click to see Generated Regression Tests
To edit these changes
git checkout codeflash/optimize-pr10820-2025-12-30T18.27.25and push.