Handspring

Open Source Multimodal Embeddings in Rust

👷‍♂️ Work in progress

Goals

Access to state-of-the-art multimodal embedding models
Only inference
High performance on CPU
Cheap & easy to scale
Flexible integration

Features

Status	Model	Feature	Embeddings
Development	OpenAI/Clip-ViT-B-32	clip_vit_b_32_image	Image
Next	OpenAI/Clip-ViT-B-32	clip_vit_b_32_text	Text
-	OpenAI/Clip-ViT-B-32	clip_vit_b_32	Image, Text
-	ImageBind

Strategies

Embed models config & safetensors files:
- No need to handle extra files
Reduce component size
- Each feature will only be responsible for generating embeddings for 1 modality for 1 model
- Extra feature for the full model (low priority)
- Split safetensors files into modalities (todo! improve code to automate this)

Performance tests

19-Mar-2024

Basic test to get an idea of the current performance CPU, same environment

Clip python, JIT

device="cpu"
model_name="ViT-B/32"
model, transform = clip.load(model_name, device=device, jit=True)

input_data_b = torch.ones((5, 3, 224, 224), dtype=torch.float32, device=device)
out = model.encode_image(input_data_b)

Results

Time elapsed in build_model() is: 1.6573503890000003
Time elapsed in forward() is: 1.101630814

Rust, Clip_vit_b_32_image

use hs_mm_embeddings::clip_vit_b_32_image;
fn main() {
    let start = Instant::now();
    let image_model = clip_vit_b_32_image::model::build_model().unwrap();

    let duration = start.elapsed();
    println!("Time elapsed in build_model() is: {:?}", duration);
    
    let input_img_batch = Tensor::ones((5, 3, 224, 224), DType::F32, &Device::Cpu).unwrap();

    let start = Instant::now();

    let output = image_model.forward(&input_img_batch).unwrap();
    
    let duration = start.elapsed();
    println!("Time elapsed in forward() is: {:?}", duration);

Results

Time elapsed in build_model() is: 483.20213ms
Time elapsed in forward() is: 1.495299234s

Testing different batch size

size 1:
- Python: 0.5376610519999998
- Rust: 282.812172ms
size 2:
- Python: 0.6142103929999996
- Rust: 941.727512ms

Observations

Time generating embeddings still too high when batch_size > 1
With batch_size = 1 is faster
Max difference between output tensors: [8.5831e-6] JIT=False & [7.6294e-6] JIT=True
Rust executable size (release): 183.7 MB

todo!: check unnecessary copies or redundant ops

Docs & Examples

todo!

Roadmap

First Clip Model
Output & Performance evaluation
Add ImageBind Model
Examples in Rust
Examples integration with other languages
Example Weaviate custom module
Expand models catalog

Integrations with other programming languages

todo!

Elixir: maybe using Rustler
JVM langs: maybe JNI
Python: PyO3

Community Server

todo!

Resources

Dev Resources

Rust basics
- Rust book
- log
- serde
- thiserror
- rust-embed
Rust ML
- HuggingFace Candle Framework

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
benches		benches
examples		examples
hs-audio-core		hs-audio-core
hs-core		hs-core
hs-image-core		hs-image-core
hs-mm-embeddings		hs-mm-embeddings
hs-text-core		hs-text-core
src		src
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Handspring

Goals

Features

Strategies

Performance tests

19-Mar-2024

Docs & Examples

Roadmap

Integrations with other programming languages

Community Server

Resources

Dev Resources

Papers & Models

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

sromero0889/handspring

Folders and files

Latest commit

History

Repository files navigation

Handspring

Goals

Features

Strategies

Performance tests

19-Mar-2024

Docs & Examples

Roadmap

Integrations with other programming languages

Community Server

Resources

Dev Resources

Papers & Models

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages