tumix implements a TUMIX: Multi-Agent Test-Time Scaling with Tool-Use Mixture in Go.
HTTP client tracing is off by default to minimize overhead. Set TUMIX_HTTP_TRACE=1 to enable OpenTelemetry spans for outgoing LLM HTTP calls; clients will keep using the shared pooled transport either way.
-model(defaultgemini-2.5-flash)-max_rounds(default 3; higher improves quality, raises cost)-temperature/-top_p/-top_k/-max_tokens/-seed-json(emit final answer as JSON on stdout)-session_dir(persist sessions to disk; default in-memory)TUMIX_SESSION_SQLITEenv to use sqlite-backed store instead of session_dir-batch_filewith-concurrency(one prompt per line)-http_trace(enable HTTP spans)-otlp_endpoint(export traces)-bench_localto run synthetic local benchmark (no LLM calls)-max_prompt_charsto fail fast on oversized prompts-max_prompt_tokenstokenizer-backed guard (CountTokens) with heuristic fallback; pricing override viaTUMIX_PRICING_FILE-metrics_addrserve/healthz,/debug/vars,/metrics(Prometheus text)
Env overrides: GOOGLE_API_KEY, TUMIX_MODEL, TUMIX_MAX_ROUNDS, TUMIX_TEMPERATURE, TUMIX_TOP_P, TUMIX_TOP_K, TUMIX_MAX_TOKENS, TUMIX_SESSION_DIR, TUMIX_HTTP_TRACE, TUMIX_CALL_WARN, TUMIX_CONCURRENCY.
- Low cost:
./tumix -model gemini-2.5-flash -max_rounds 2 -temperature 0.2 "Explain X" - High quality:
./tumix -model gemini-2.5-pro -max_rounds 3 -top_p 0.95 -max_tokens 512 "Explain Y" - Batch:
./tumix -batch_file prompts.txt -concurrency 4 -json - Persist & observe:
TUMIX_SESSION_SQLITE=/tmp/tumix.db ./tumix -metrics_addr :9090 -http_trace - CI smoke:
./tools/bin/gotestsum -f standard-verbose -- -race -count=1 -shuffle=on -cover ./...
/metricsemits counters:tumix_requests,tumix_input_tokens,tumix_output_tokens,tumix_cost_usd./debug/varsexposes the same data via expvar;/healthzreturnsok.