Skip to content

High-performance Rust CLI and library achieving 10K+ req/s for LLM APIs. Features weighted load-balancing, HTTP/2 pooling, and real-time statistics.

License

Notifications You must be signed in to change notification settings

yigitkonur/rust-load-tester

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔥 Blaze API 🔥

Stop waiting for API responses. Start blazing through them.

The ultimate batch API client for your LLM workloads. It load-balances across endpoints, retries intelligently, and processes 10,000+ requests per second on a laptop.

crates.io rust   •   license platform

zero config 10k rps


Blaze API is the batch processor your LLM workloads deserve. Stop writing brittle Python scripts that crash at 100 req/sec. This tool acts like a fleet of pro API consumers, intelligently distributing requests across endpoints, handling failures gracefully, and maxing out your API capacity without breaking a sweat.

Blazing Fast
10K+ req/sec on 8 cores

🎯

Smart Load Balancing
Weighted distribution across endpoints

🔄

Auto Retry
Exponential backoff with jitter

📊

Real-time Stats
Progress, RPS, latency tracking

How it slaps:

  • You: blaze -i requests.jsonl -o results.jsonl
  • Blaze: Load balances, retries failures, tracks progress, writes results.
  • You: Go grab a coffee while 100K requests complete. ☕
  • Result: Perfectly formatted JSONL with every response. Zero babysitting.

💥 Why Blaze Slaps Other Methods

Manually scripting API requests is a vibe-killer. Blaze makes other methods look ancient.

❌ The Old Way (Pain) ✅ The Blaze Way (Glory)
  1. Write Python script with asyncio.
  2. Hit GIL limits at 500 req/sec.
  3. Script crashes, lose progress.
  4. Add retry logic, still flaky.
  5. Manually restart, pray it works.
  1. blaze -i data.jsonl -o out.jsonl
  2. Watch the progress bar fly.
  3. Failures auto-retry with backoff.
  4. Results stream to disk instantly.
  5. Go grab a coffee. ☕

We're not just sending requests. We're building a high-throughput, fault-tolerant pipeline with weighted load balancing, connection pooling, and intelligent retry logic that actually respects your API provider's limits.


🚀 Get Started in 60 Seconds

Platform Method Command
🦀 All Cargo cargo install blaze-api
🍎 macOS Homebrew brew install yigitkonur/tap/blaze
🐧 Linux Binary See releases
🪟 Windows Binary See releases

🦀 From Source (Recommended for Development)

# Clone and build
git clone https://github.com/yigitkonur/blaze-api.git
cd blaze-api
cargo build --release

# Binary is at ./target/release/blaze

📦 From crates.io

cargo install blaze-api

✨ Zero Config: After installation, blaze is ready to go. Just point it at your JSONL file!


🎮 Usage: Fire and Forget

The workflow is dead simple.

Basic Usage

# Process requests and save results
blaze --input requests.jsonl --output results.jsonl

# Short flags work too
blaze -i requests.jsonl -o results.jsonl

# High-throughput mode (10K req/sec)
blaze -i data.jsonl -o out.jsonl --rate 10000 --workers 200

With Custom Endpoints

# Use a config file for multiple endpoints
blaze -i requests.jsonl -o results.jsonl --config endpoints.json

# Or set via environment
export BLAZE_ENDPOINT_URL="https://api.openai.com/v1/completions"
export BLAZE_API_KEY="sk-..."
export BLAZE_MODEL="gpt-4"
blaze -i requests.jsonl -o results.jsonl

Input Format

Your requests.jsonl file should have one JSON object per line:

{"input": "What is the capital of France?"}
{"input": "Explain quantum computing in simple terms."}
{"input": "Write a haiku about Rust programming."}

Or with custom request bodies:

{"body": {"messages": [{"role": "user", "content": "Hello!"}], "model": "gpt-4"}}
{"body": {"messages": [{"role": "system", "content": "You are helpful."}, {"role": "user", "content": "Hi!"}]}}

Output Format

Results are written as JSONL:

{"input": "What is the capital of France?", "response": {"choices": [...]}, "metadata": {"endpoint": "...", "latency_ms": 234, "attempts": 1}}
{"input": "Explain quantum computing...", "response": {"choices": [...]}, "metadata": {"endpoint": "...", "latency_ms": 189, "attempts": 1}}

Errors go to errors.jsonl:

{"input": "...", "error": "HTTP 429: Rate limit exceeded", "status_code": 429, "attempts": 3}

✨ Feature Breakdown: The Secret Sauce

Feature What It Does Why You Care
⚡ Async Everything
Tokio runtime
Non-blocking I/O with work-stealing scheduler Saturates your CPU cores efficiently
🎯 Weighted Load Balancing
Smart distribution
Route traffic based on endpoint capacity Max out multiple API keys simultaneously
🔄 Exponential Backoff
With jitter
Intelligent retry with randomized delays Respects rate limits, avoids thundering herd
📊 Real-time Progress
Live stats
RPS, success rate, latency, ETA Know exactly what's happening
🔌 Connection Pooling
HTTP/2 keep-alive
Reuses connections across requests Eliminates TCP handshake overhead
💾 Streaming Output
Immediate writes
Results written as they complete Never lose progress on crashes
🏥 Health Tracking
Per-endpoint
Automatic failover on errors Unhealthy endpoints get cooled off
🔧 Flexible Config
CLI + ENV + JSON
Configure via args, env vars, or files Fits any workflow

⚙️ Configuration

CLI Flags

USAGE:
    blaze [OPTIONS] --input <FILE>

OPTIONS:
    -i, --input <FILE>        Path to JSONL input file [env: BLAZE_INPUT]
    -o, --output <FILE>       Path for successful responses [env: BLAZE_OUTPUT]
    -e, --errors <FILE>       Path for error responses [default: errors.jsonl]
    -r, --rate <N>            Max requests per second [default: 1000]
    -w, --workers <N>         Concurrent workers [default: 50]
    -t, --timeout <SECS>      Request timeout [default: 30]
    -a, --max-attempts <N>    Max retry attempts [default: 3]
    -c, --config <FILE>       Endpoint config file (JSON)
    -v, --verbose             Enable debug logging
        --json-logs           Output logs as JSON
        --no-progress         Disable progress bar
        --dry-run             Validate config without processing
    -h, --help                Print help
    -V, --version             Print version

Environment Variables

All options can be set via environment variables with BLAZE_ prefix:

export BLAZE_INPUT="requests.jsonl"
export BLAZE_OUTPUT="results.jsonl"
export BLAZE_RATE="5000"
export BLAZE_WORKERS="100"
export BLAZE_ENDPOINT_URL="https://api.example.com/v1/completions"
export BLAZE_API_KEY="your-api-key"
export BLAZE_MODEL="gpt-4"

Configuration File

For multiple endpoints, create endpoints.json:

{
  "endpoints": [
    {
      "url": "https://api.openai.com/v1/completions",
      "weight": 2,
      "api_key": "sk-key-1",
      "model": "gpt-4",
      "max_concurrent": 100
    },
    {
      "url": "https://api.openai.com/v1/completions",
      "weight": 1,
      "api_key": "sk-key-2",
      "model": "gpt-4",
      "max_concurrent": 50
    }
  ],
  "request": {
    "timeout": "30s",
    "rate_limit": 5000,
    "workers": 100
  },
  "retry": {
    "max_attempts": 3,
    "initial_backoff": "100ms",
    "max_backoff": "10s",
    "multiplier": 2.0
  }
}

Then run:

blaze -i requests.jsonl -o results.jsonl --config endpoints.json

📈 Performance Tips

Maximize Throughput

# For maximum speed (adjust based on your API limits)
blaze -i data.jsonl -o out.jsonl \
  --rate 10000 \
  --workers 200 \
  --timeout 60

Balance Load Across Keys

{
  "endpoints": [
    {"url": "...", "api_key": "key-1", "weight": 3, "max_concurrent": 150},
    {"url": "...", "api_key": "key-2", "weight": 2, "max_concurrent": 100},
    {"url": "...", "api_key": "key-3", "weight": 1, "max_concurrent": 50}
  ]
}

Handle Rate Limits Gracefully

{
  "retry": {
    "max_attempts": 5,
    "initial_backoff": "500ms",
    "max_backoff": "30s",
    "multiplier": 2.0
  }
}

🛠️ For Developers & Tinkerers

Building from Source

git clone https://github.com/yigitkonur/blaze-api.git
cd blaze-api

# Debug build
cargo build

# Release build (optimized)
cargo build --release

# Run tests
cargo test

# Run benchmarks
cargo bench

Using as a Library

use blaze_api::{Config, EndpointConfig, Processor};

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let config = Config {
        endpoints: vec![EndpointConfig {
            url: "https://api.example.com/v1/completions".to_string(),
            weight: 1,
            api_key: Some("your-key".to_string()),
            model: Some("gpt-4".to_string()),
            max_concurrent: 100,
        }],
        ..Default::default()
    };

    let processor = Processor::new(config)?;
    let result = processor.process_file(
        "requests.jsonl".into(),
        Some("results.jsonl".into()),
        "errors.jsonl".into(),
        true,
    ).await?;

    result.print_summary();
    Ok(())
}

Project Structure

src/
├── lib.rs        # Library entry point
├── main.rs       # CLI binary
├── config.rs     # Configuration management
├── client.rs     # HTTP client with retry logic
├── endpoint.rs   # Load balancer implementation
├── processor.rs  # Main processing orchestration
├── request.rs    # Request/response types
├── tracker.rs    # Statistics tracking
└── error.rs      # Error types

🔥 Common Issues & Quick Fixes

Expand for troubleshooting tips
Problem Solution
"Too many open files" Increase ulimit: ulimit -n 65535
Connection timeouts Increase --timeout or reduce --workers
Rate limit errors (429) Lower --rate or add more API keys
Memory usage high Reduce --workers for large requests
Progress bar not showing Don't pipe output, or use --no-progress --json-logs

Build Issues:

Problem Solution
OpenSSL errors Install OpenSSL dev: apt install libssl-dev or use --features rustls
Rust version error Update Rust: rustup update stable (requires 1.75+)

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

# Fork the repo, then:
git clone https://github.com/YOUR_USERNAME/blaze-api.git
cd blaze-api
cargo test
# Make your changes
cargo fmt
cargo clippy
cargo test
# Submit PR

📄 License

MIT © Yiğit Konur


Built with 🔥 because waiting for API responses is a soul-crushing waste of time.

⬆ Back to Top

About

High-performance Rust CLI and library achieving 10K+ req/s for LLM APIs. Features weighted load-balancing, HTTP/2 pooling, and real-time statistics.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages