-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
Description
We are conducting a large-scale benchmark (10M vectors) to evaluate the migration from ChromaDB v0.4.x to v1.3.5. Our results show a critical performance regression in the new version.
Despite the "4.2x faster" claim in the v1.0 release notes, our real-world streaming ingestion (single process) sees a 90% drop in TPS (Transactions Per Second) and a 7x increase in total elapsed time.
Benchmark Comparison
Dataset: 10M Vectors, 1024 Dimension, Random Float32
Hardware: Linux, Single Process
Distance: Cosine
| Metric | v0.4.x (Legacy) | v1.3.5 (Current) | Impact |
|---|---|---|---|
| Batch Size | 5,000 | 1,500 | (Noted diff) |
| Progress | 3.5M / 10M | 3.5M / 10M | Same Checkpoint |
| Total Time | 1h 03m | 7h 21m | ~7x Slower |
| Avg TPS | 923 docs/s | 133 docs/s | ~85% Drop |
| Avg Batch Latency | 5.4s (for 5k) | 12.8s (for 1.5k) | High Overhead |
Analysis & Questions
We observed that v1.3.5 suffers from high per-batch overhead and severe degradation as the collection size grows:
- Overhead: Processing 1.5k items (v1.3) takes 12.8s, whereas processing 5k items (v0.4) took only 5.4s. The new architecture seems to have extremely high fixed costs per
add()call. - Degradation: TPS dropped from 243 (at 500k) to 133 (at 3.5M). The HNSW index construction seems much more expensive in the Rust implementation compared to the old DuckDB/C++ one.
Questions:
- Is this massive regression expected for small/medium batch sizes (1.5k)?
- Is there a recommended
hnsw:construction_efor other config to tune for write-heavy scenarios in v1.3? - Does v1.3 enforce strict WAL/fsync that v0.4 didn't, causing IO bottlenecks?
We really want to leverage the new features in v1.0+, but this write performance makes it impossible for our scale. Any advice is appreciated!