Skip to content

feat(loaders): validate OHLC invariants at the loader boundary#274

Merged
warren618 merged 2 commits into
HKUDS:mainfrom
Shizoqua:feat/loader-ohlc-validation
Jun 20, 2026
Merged

feat(loaders): validate OHLC invariants at the loader boundary#274
warren618 merged 2 commits into
HKUDS:mainfrom
Shizoqua:feat/loader-ohlc-validation

Conversation

@Shizoqua

Copy link
Copy Markdown
Contributor

Problem

The loaders only drop NaN rows. Bars that are present but structurally invalid — high < low, a non-positive price, or a high/low that does not bracket open/close — pass straight through into the backtest, where they silently corrupt downstream metrics (and can yield non-finite values that break strict JSON serialization).

Change

  • Add a reusable validate_ohlc(frame, strategy="drop") helper in agent/backtest/loaders/base.py (next to validate_date_range).
  • Call it after the existing dropna in the yfinance and local loaders.
  • Default drops offending rows and logs a count (matching the loaders' existing silent-clean convention); "warn" and "raise" strategies are available for stricter callers.

Tests

agent/tests/test_ohlc_validation.py covers drop/warn/raise behaviour, pass-through of valid frames (including flat/doji bars), and an end-to-end check that a dirty bar in a local CSV is dropped by the loader.

python -m pytest agent/tests/test_ohlc_validation.py agent/tests/test_local_loader.py -q
# 7 passed, 1 skipped
Shizoqua added 2 commits June 19, 2026 21:33
Loaders only dropped NaN rows, so structurally invalid bars (high<low, a
non-positive price, or a high/low that fails to bracket open/close) passed
straight into the backtest and surfaced downstream as NaN/inf metrics that
break the strict allow_nan=False JSON serializers.

Add a shared validate_ohlc() helper in loaders/base.py and call it after the
existing dropna in the yfinance and local loaders. The default drops offending
rows and logs a count, matching the loaders' existing silent-clean convention;
"warn" and "raise" strategies are available for stricter callers.

Signed-off-by: Lanre Shittu <136805224+Shizoqua@users.noreply.github.com>
Wire validate_ohlc into the runner's single fetch-convergence point so
auto, single-source, runtime-fallback, and any future loader are guarded
uniformly — not just the two loaders wired at source. Adds
_sanitize_data_map(data_map) in runner.py, applied right after the data
map is assembled and before the empty-data check, plus a test that a dirty
bar from an unwired loader is dropped before it reaches the backtest.
@warren618 warren618 merged commit adfbe7d into HKUDS:main Jun 20, 2026
1 check passed
@warren618

Copy link
Copy Markdown
Collaborator

Merged — thanks @Shizoqua! 🙏 The validate_ohlc helper with its drop/warn/raise strategies is exactly the right boundary check.

Before merging I added one follow-up commit so it guards every source, not just local + yfinance. Instead of wiring the call into all ~18 loaders one by one, it now also runs once centrally in runner.py (_sanitize_data_map) at the single point every fetch path converges — auto / single-source / runtime-fallback — so any current or future loader is covered. Your per-loader calls stay (they also guard the get_market_data path). Added a test for the central pass; full suite + CI green.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants