Significant Write Performance Regression in v1.3.5: 9x Slower than v0.4 with 10M Vectors

Description

We are conducting a large-scale benchmark (10M vectors) to evaluate the migration from ChromaDB v0.4.x to v1.3.5. Our results show a critical performance regression in the new version.

Despite the "4.2x faster" claim in the v1.0 release notes, our real-world streaming ingestion (single process) sees a 90% drop in TPS (Transactions Per Second) and a 7x increase in total elapsed time.

Benchmark Comparison

Dataset: 10M Vectors, 1024 Dimension, Random Float32
Hardware: Linux, Single Process
Distance: Cosine

Metric	v0.4.x (Legacy)	v1.3.5 (Current)	Impact
Batch Size	5,000	1,500	(Noted diff)
Progress	3.5M / 10M	3.5M / 10M	Same Checkpoint
Total Time	1h 03m	7h 21m	~7x Slower
Avg TPS	923 docs/s	133 docs/s	~85% Drop
Avg Batch Latency	5.4s (for 5k)	12.8s (for 1.5k)	High Overhead

Analysis & Questions

We observed that v1.3.5 suffers from high per-batch overhead and severe degradation as the collection size grows:

Overhead: Processing 1.5k items (v1.3) takes 12.8s, whereas processing 5k items (v0.4) took only 5.4s. The new architecture seems to have extremely high fixed costs per add() call.
Degradation: TPS dropped from 243 (at 500k) to 133 (at 3.5M). The HNSW index construction seems much more expensive in the Rust implementation compared to the old DuckDB/C++ one.

Questions:

Is this massive regression expected for small/medium batch sizes (1.5k)?
Is there a recommended hnsw:construction_ef or other config to tune for write-heavy scenarios in v1.3?
Does v1.3 enforce strict WAL/fsync that v0.4 didn't, causing IO bottlenecks?

We really want to leverage the new features in v1.0+, but this write performance makes it impossible for our scale. Any advice is appreciated!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Significant Write Performance Regression in v1.3.5: 9x Slower than v0.4 with 10M Vectors #5994

Description

Benchmark Comparison

Analysis & Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Significant Write Performance Regression in v1.3.5: 9x Slower than v0.4 with 10M Vectors #5994

Description

Description

Benchmark Comparison

Analysis & Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions