🔥 Blaze API 🔥

Stop waiting for API responses. Start blazing through them.

The ultimate batch API client for your LLM workloads. It load-balances across endpoints, retries intelligently, and processes 10,000+ requests per second on a laptop.

•

🧭 Quick Navigation

⚡ Get Started • ✨ Key Features • 🎮 Usage & Examples • ⚙️ Configuration • 🆚 Why Blaze

Blaze API is the batch processor your LLM workloads deserve. Stop writing brittle Python scripts that crash at 100 req/sec. This tool acts like a fleet of pro API consumers, intelligently distributing requests across endpoints, handling failures gracefully, and maxing out your API capacity without breaking a sweat.

⚡

Blazing Fast
_{10K+ req/sec on 8 cores}

🎯

Smart Load Balancing
_{Weighted distribution across endpoints}

🔄

Auto Retry
_{Exponential backoff with jitter}

📊

Real-time Stats
_{Progress, RPS, latency tracking}

How it slaps:

You: blaze -i requests.jsonl -o results.jsonl
Blaze: Load balances, retries failures, tracks progress, writes results.
You: Go grab a coffee while 100K requests complete. ☕
Result: Perfectly formatted JSONL with every response. Zero babysitting.

💥 Why Blaze Slaps Other Methods

Manually scripting API requests is a vibe-killer. Blaze makes other methods look ancient.

❌ The Old Way (Pain)	✅ The Blaze Way (Glory)
Write Python script with asyncio. Hit GIL limits at 500 req/sec. Script crashes, lose progress. Add retry logic, still flaky. Manually restart, pray it works.	`blaze -i data.jsonl -o out.jsonl` Watch the progress bar fly. Failures auto-retry with backoff. Results stream to disk instantly. Go grab a coffee. ☕

We're not just sending requests. We're building a high-throughput, fault-tolerant pipeline with weighted load balancing, connection pooling, and intelligent retry logic that actually respects your API provider's limits.

🚀 Get Started in 60 Seconds

Platform	Method	Command
🦀 All	Cargo	`cargo install blaze-api`
🍎 macOS	Homebrew	`brew install yigitkonur/tap/blaze`
🐧 Linux	Binary	See releases
🪟 Windows	Binary	See releases

🦀 From Source (Recommended for Development)

# Clone and build
git clone https://github.com/yigitkonur/blaze-api.git
cd blaze-api
cargo build --release

# Binary is at ./target/release/blaze

📦 From crates.io

cargo install blaze-api

✨ Zero Config: After installation, blaze is ready to go. Just point it at your JSONL file!

🎮 Usage: Fire and Forget

The workflow is dead simple.

Basic Usage

# Process requests and save results
blaze --input requests.jsonl --output results.jsonl

# Short flags work too
blaze -i requests.jsonl -o results.jsonl

# High-throughput mode (10K req/sec)
blaze -i data.jsonl -o out.jsonl --rate 10000 --workers 200

With Custom Endpoints

# Use a config file for multiple endpoints
blaze -i requests.jsonl -o results.jsonl --config endpoints.json

# Or set via environment
export BLAZE_ENDPOINT_URL="https://api.openai.com/v1/completions"
export BLAZE_API_KEY="sk-..."
export BLAZE_MODEL="gpt-4"
blaze -i requests.jsonl -o results.jsonl

Input Format

Your requests.jsonl file should have one JSON object per line:

{"input": "What is the capital of France?"}
{"input": "Explain quantum computing in simple terms."}
{"input": "Write a haiku about Rust programming."}

Or with custom request bodies:

{"body": {"messages": [{"role": "user", "content": "Hello!"}], "model": "gpt-4"}}
{"body": {"messages": [{"role": "system", "content": "You are helpful."}, {"role": "user", "content": "Hi!"}]}}

Output Format

Results are written as JSONL:

{"input": "What is the capital of France?", "response": {"choices": [...]}, "metadata": {"endpoint": "...", "latency_ms": 234, "attempts": 1}}
{"input": "Explain quantum computing...", "response": {"choices": [...]}, "metadata": {"endpoint": "...", "latency_ms": 189, "attempts": 1}}

Errors go to errors.jsonl:

{"input": "...", "error": "HTTP 429: Rate limit exceeded", "status_code": 429, "attempts": 3}

✨ Feature Breakdown: The Secret Sauce

Feature	What It Does	Why You Care
⚡ Async Everything `Tokio runtime`	Non-blocking I/O with work-stealing scheduler	Saturates your CPU cores efficiently
🎯 Weighted Load Balancing `Smart distribution`	Route traffic based on endpoint capacity	Max out multiple API keys simultaneously
🔄 Exponential Backoff `With jitter`	Intelligent retry with randomized delays	Respects rate limits, avoids thundering herd
📊 Real-time Progress `Live stats`	RPS, success rate, latency, ETA	Know exactly what's happening
🔌 Connection Pooling `HTTP/2 keep-alive`	Reuses connections across requests	Eliminates TCP handshake overhead
💾 Streaming Output `Immediate writes`	Results written as they complete	Never lose progress on crashes
🏥 Health Tracking `Per-endpoint`	Automatic failover on errors	Unhealthy endpoints get cooled off
🔧 Flexible Config `CLI + ENV + JSON`	Configure via args, env vars, or files	Fits any workflow

⚙️ Configuration

CLI Flags

USAGE:
    blaze [OPTIONS] --input <FILE>

OPTIONS:
    -i, --input <FILE>        Path to JSONL input file [env: BLAZE_INPUT]
    -o, --output <FILE>       Path for successful responses [env: BLAZE_OUTPUT]
    -e, --errors <FILE>       Path for error responses [default: errors.jsonl]
    -r, --rate <N>            Max requests per second [default: 1000]
    -w, --workers <N>         Concurrent workers [default: 50]
    -t, --timeout <SECS>      Request timeout [default: 30]
    -a, --max-attempts <N>    Max retry attempts [default: 3]
    -c, --config <FILE>       Endpoint config file (JSON)
    -v, --verbose             Enable debug logging
        --json-logs           Output logs as JSON
        --no-progress         Disable progress bar
        --dry-run             Validate config without processing
    -h, --help                Print help
    -V, --version             Print version

Environment Variables

All options can be set via environment variables with BLAZE_ prefix:

export BLAZE_INPUT="requests.jsonl"
export BLAZE_OUTPUT="results.jsonl"
export BLAZE_RATE="5000"
export BLAZE_WORKERS="100"
export BLAZE_ENDPOINT_URL="https://api.example.com/v1/completions"
export BLAZE_API_KEY="your-api-key"
export BLAZE_MODEL="gpt-4"

Configuration File

For multiple endpoints, create endpoints.json:

{
  "endpoints": [
    {
      "url": "https://api.openai.com/v1/completions",
      "weight": 2,
      "api_key": "sk-key-1",
      "model": "gpt-4",
      "max_concurrent": 100
    },
    {
      "url": "https://api.openai.com/v1/completions",
      "weight": 1,
      "api_key": "sk-key-2",
      "model": "gpt-4",
      "max_concurrent": 50
    }
  ],
  "request": {
    "timeout": "30s",
    "rate_limit": 5000,
    "workers": 100
  },
  "retry": {
    "max_attempts": 3,
    "initial_backoff": "100ms",
    "max_backoff": "10s",
    "multiplier": 2.0
  }
}

Then run:

blaze -i requests.jsonl -o results.jsonl --config endpoints.json

📈 Performance Tips

Maximize Throughput

# For maximum speed (adjust based on your API limits)
blaze -i data.jsonl -o out.jsonl \
  --rate 10000 \
  --workers 200 \
  --timeout 60

Balance Load Across Keys

{
  "endpoints": [
    {"url": "...", "api_key": "key-1", "weight": 3, "max_concurrent": 150},
    {"url": "...", "api_key": "key-2", "weight": 2, "max_concurrent": 100},
    {"url": "...", "api_key": "key-3", "weight": 1, "max_concurrent": 50}
  ]
}

Handle Rate Limits Gracefully

{
  "retry": {
    "max_attempts": 5,
    "initial_backoff": "500ms",
    "max_backoff": "30s",
    "multiplier": 2.0
  }
}

🛠️ For Developers & Tinkerers

Building from Source

git clone https://github.com/yigitkonur/blaze-api.git
cd blaze-api

# Debug build
cargo build

# Release build (optimized)
cargo build --release

# Run tests
cargo test

# Run benchmarks
cargo bench

Using as a Library

use blaze_api::{Config, EndpointConfig, Processor};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let config = Config {
        endpoints: vec![EndpointConfig {
            url: "https://api.example.com/v1/completions".to_string(),
            weight: 1,
            api_key: Some("your-key".to_string()),
            model: Some("gpt-4".to_string()),
            max_concurrent: 100,
        }],
        ..Default::default()
    };

    let processor = Processor::new(config)?;
    let result = processor.process_file(
        "requests.jsonl".into(),
        Some("results.jsonl".into()),
        "errors.jsonl".into(),
        true,
    ).await?;

    result.print_summary();
    Ok(())
}

Project Structure

src/
├── lib.rs        # Library entry point
├── main.rs       # CLI binary
├── config.rs     # Configuration management
├── client.rs     # HTTP client with retry logic
├── endpoint.rs   # Load balancer implementation
├── processor.rs  # Main processing orchestration
├── request.rs    # Request/response types
├── tracker.rs    # Statistics tracking
└── error.rs      # Error types

🔥 Common Issues & Quick Fixes

Expand for troubleshooting tips

Problem	Solution
"Too many open files"	Increase ulimit: `ulimit -n 65535`
Connection timeouts	Increase `--timeout` or reduce `--workers`
Rate limit errors (429)	Lower `--rate` or add more API keys
Memory usage high	Reduce `--workers` for large requests
Progress bar not showing	Don't pipe output, or use `--no-progress --json-logs`

Build Issues:

Problem	Solution
OpenSSL errors	Install OpenSSL dev: `apt install libssl-dev` or use `--features rustls`
Rust version error	Update Rust: `rustup update stable` (requires 1.75+)

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

# Fork the repo, then:
git clone https://github.com/YOUR_USERNAME/blaze-api.git
cd blaze-api
cargo test
# Make your changes
cargo fmt
cargo clippy
cargo test
# Submit PR

📄 License

Built with 🔥 because waiting for API responses is a soul-crushing waste of time.

⬆ Back to Top

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
benches		benches
examples		examples
src		src
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

License

yigitkonur/rust-load-tester

Folders and files

Latest commit

History

Repository files navigation