Skip to content

Significant Write Performance Regression in v1.3.5: 9x Slower than v0.4 with 10M Vectors #5994

@leopaisen-zb

Description

@leopaisen-zb

Description

We are conducting a large-scale benchmark (10M vectors) to evaluate the migration from ChromaDB v0.4.x to v1.3.5. Our results show a critical performance regression in the new version.

Despite the "4.2x faster" claim in the v1.0 release notes, our real-world streaming ingestion (single process) sees a 90% drop in TPS (Transactions Per Second) and a 7x increase in total elapsed time.

Benchmark Comparison

Dataset: 10M Vectors, 1024 Dimension, Random Float32
Hardware: Linux, Single Process
Distance: Cosine

Metric v0.4.x (Legacy) v1.3.5 (Current) Impact
Batch Size 5,000 1,500 (Noted diff)
Progress 3.5M / 10M 3.5M / 10M Same Checkpoint
Total Time 1h 03m 7h 21m ~7x Slower
Avg TPS 923 docs/s 133 docs/s ~85% Drop
Avg Batch Latency 5.4s (for 5k) 12.8s (for 1.5k) High Overhead

Analysis & Questions

We observed that v1.3.5 suffers from high per-batch overhead and severe degradation as the collection size grows:

  1. Overhead: Processing 1.5k items (v1.3) takes 12.8s, whereas processing 5k items (v0.4) took only 5.4s. The new architecture seems to have extremely high fixed costs per add() call.
  2. Degradation: TPS dropped from 243 (at 500k) to 133 (at 3.5M). The HNSW index construction seems much more expensive in the Rust implementation compared to the old DuckDB/C++ one.

Questions:

  1. Is this massive regression expected for small/medium batch sizes (1.5k)?
  2. Is there a recommended hnsw:construction_ef or other config to tune for write-heavy scenarios in v1.3?
  3. Does v1.3 enforce strict WAL/fsync that v0.4 didn't, causing IO bottlenecks?

We really want to leverage the new features in v1.0+, but this write performance makes it impossible for our scale. Any advice is appreciated!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions