feat(loaders): validate OHLC invariants at the loader boundary#274
Merged
Conversation
Loaders only dropped NaN rows, so structurally invalid bars (high<low, a non-positive price, or a high/low that fails to bracket open/close) passed straight into the backtest and surfaced downstream as NaN/inf metrics that break the strict allow_nan=False JSON serializers. Add a shared validate_ohlc() helper in loaders/base.py and call it after the existing dropna in the yfinance and local loaders. The default drops offending rows and logs a count, matching the loaders' existing silent-clean convention; "warn" and "raise" strategies are available for stricter callers. Signed-off-by: Lanre Shittu <136805224+Shizoqua@users.noreply.github.com>
Wire validate_ohlc into the runner's single fetch-convergence point so auto, single-source, runtime-fallback, and any future loader are guarded uniformly — not just the two loaders wired at source. Adds _sanitize_data_map(data_map) in runner.py, applied right after the data map is assembled and before the empty-data check, plus a test that a dirty bar from an unwired loader is dropped before it reaches the backtest.
Collaborator
|
Merged — thanks @Shizoqua! 🙏 The Before merging I added one follow-up commit so it guards every source, not just local + yfinance. Instead of wiring the call into all ~18 loaders one by one, it now also runs once centrally in |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The loaders only drop NaN rows. Bars that are present but structurally invalid —
high < low, a non-positive price, or a high/low that does not bracket open/close — pass straight through into the backtest, where they silently corrupt downstream metrics (and can yield non-finite values that break strict JSON serialization).Change
validate_ohlc(frame, strategy="drop")helper inagent/backtest/loaders/base.py(next tovalidate_date_range).dropnain the yfinance and local loaders."warn"and"raise"strategies are available for stricter callers.Tests
agent/tests/test_ohlc_validation.pycovers drop/warn/raise behaviour, pass-through of valid frames (including flat/doji bars), and an end-to-end check that a dirty bar in a local CSV is dropped by the loader.