Realistic data simulator for ML system testing with time-compressed scenarios and controlled drift
SIMTOM is an extensible data generation platform that creates realistic streaming data for machine learning model training and testing. Features include configurable arrival patterns, noise injection, drift simulation, and time compression for accelerated development cycles.
Production Endpoint: https://simtom-production.up.railway.app
# Quick test
curl https://simtom-production.up.railway.app/generators
# Stream sample data
curl -X POST https://simtom-production.up.railway.app/stream/bnpl \
-H "Content-Type: application/json" \
-d '{"rate_per_second": 2.0, "total_records": 3}'- π― Realistic Traffic Patterns: Uniform, Poisson, NHPP, and Burst arrival patterns
- π Rich Data Generation: BNPL transactions with risk scoring and customer profiles
- β±οΈ Time Compression: Simulate days/weeks of data in minutes
- π§ Plugin Architecture: Easy extension with custom generators
- π‘ Real-time Streaming: Server-sent events with configurable rates
- π§ͺ ML-Ready: Built-in noise, drift, and deterministic seeding
git clone https://github.com/whitehackr/simtom.git
cd simtom
poetry installpoetry run python scripts/run_server.py
curl http://localhost:8000/generatorsFixed intervals - predictable for testing
curl -X POST /stream/bnpl -d '{
"rate_per_second": 2.0,
"arrival_pattern": "uniform"
}'Random intervals with realistic variability
curl -X POST /stream/bnpl -d '{
"rate_per_second": 2.0,
"arrival_pattern": "poisson"
}'Daily traffic patterns with peak hours
curl -X POST /stream/bnpl -d '{
"rate_per_second": 1.0,
"arrival_pattern": "nhpp",
"peak_hours": [12, 19],
"time_compression": 24.0
}'Flash sale and event-driven spikes
curl -X POST /stream/bnpl -d '{
"rate_per_second": 2.0,
"arrival_pattern": "burst",
"burst_intensity": 3.0,
"burst_probability": 0.6
}'BNPL transactions include 40+ fields:
{
"transaction_id": "txn_00000001",
"customer_id": "cust_000001",
"amount": 485.61,
"risk_score": 0.85,
"risk_level": "high",
"installment_count": 4,
"customer_age_bracket": "25-34",
"product_category": "electronics",
"device_type": "mobile",
"payment_provider": "afterpay"
}| Parameter | Description | Default |
|---|---|---|
rate_per_second |
Arrival rate (0.1-1000) | 1.0 |
arrival_pattern |
Traffic pattern | "uniform" |
peak_hours |
NHPP peak hours | [12, 19] |
burst_intensity |
Burst multiplier | 2.0 |
time_compression |
Time acceleration | 1.0 |
noise_type |
Data quality | "none" |
drift_type |
Model drift | "none" |
seed |
Deterministic output | null |
- Plugin System: Auto-discovery of generators
- Memory Efficient: O(1) streaming regardless of dataset size
- Entity Consistency: LRU registries maintain referential integrity
- FastAPI: Modern async web framework
- Pydantic: Type-safe configuration validation
- ML Model Training: Realistic arrival patterns for better model performance
- Load Testing: Simulate traffic spikes and patterns
- Feature Engineering: Rich, consistent data for pipeline development
- System Testing: Controlled drift and noise injection
- Research: Reproducible datasets with deterministic seeding
- Development Guide - Architecture and development commands
- Live API Docs - Interactive OpenAPI docs
SIMTOM is designed for community extension. Add new generators by:
- Inherit from
BaseGenerator - Implement
async def generate_record() - Add
@register_generator("name")decorator - Place in
simtom/generators/- auto-discovered!
MIT License - see LICENSE file.