HNSW.jl

Approximate Nearest Neighbor Searches using the "Hierarchical Navigable Small World" (HNSW) algorithm as described in https://arxiv.org/abs/1603.09320 .

Highlights

Written in Julia - no non-julian dependencies
Supports incremental index creation
Works with arbitrary distance functions
Is data-agnostic - can work with data of arbitrary types given a corresponding distance function

Creating an Index

An Index in this library is a struct of type HierarchicalNSW which can be constructed using:

hnsw = HierarchicalNSW(data; metric, M, efConstruction)

data: This is an AbstractVector of the data points to be used.
metric = Euclidean(): The metric to use for distance calculation. Any metric defined in Distances.jl should work as well as any type for which evaluate(::CustomMetric, x,y) is implemented.
M = 10: The maximum number of links per node on a level >1. Note that value highly influences recall depending on data.
M0 = 2M: The maximum number of links on the bottom layer (=1). Defaults to M0 = 2M.
efConstruction = 100: Maximum length of dynamic link lists during index creation. Low values may reduce recall but large values increase runtime of index creation.
ef = 10: Maximum length of dynamic link lists during search. May be changed afterwards using set_ef!(hnsw, value)
m_L = 1/log(M): Prefactor for random level generation.
max_elements = length(data): May be set to a larger value in case one wants to add elements to the structure after initial creation.

Once the HierarchicalNSW struct is initialized the search graph can be built by calling

add_to_graph!(hnsw [, indices])

which iteratively inserts all points from data into the graph. Optionally one may provide indices a subset of all the indices in data to partially to construct the graph.

Searching

Given an initialized HierarchicalNSW one can search for approximate nearest neighbors using

idxs, dists = knn_search(hnsw, query, k)

where query may either be a single point of type eltype(data) or a vector of such points.

A simple example:

using HNSW

dim = 10
num_elements = 10000
data = [rand(dim) for i=1:num_elements]

#Intialize HNSW struct
hnsw = HierarchicalNSW(data; efConstruction=100, M=16, ef=50)

#Add all data points into the graph
#Optionally pass a subset of the indices in data to partially construct the graph
add_to_graph!(hnsw)

# optionally with a progress notification:
# step = (num_elements) ÷ 100
# add_to_graph!(hnsw) do i
#   if iszero(i % step)
#     @info "Processed: $(i ÷ step)%"
#   end
# end

queries = [rand(dim) for i=1:1000]

k = 10
# Find k (approximate) nearest neighbors for each of the queries
idxs, dists = knn_search(hnsw, queries, k)

Multi-Threading

A multi-threaded version of this algorithm is available. To use it, checkout the branch multi-threaded and start the indexing with:

 add_to_graph!(hnsw; multithreading=true)

For multi-threaded searches add multithreading=true as a keyword argument to knn_search.

Name		Name	Last commit message	Last commit date
Latest commit History 126 Commits
.github/workflows		.github/workflows
benchmark		benchmark
docs		docs
src		src
test		test
LICENSE		LICENSE
Project.toml		Project.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HNSW.jl

Highlights

Creating an Index

Searching

A simple example:

Multi-Threading

About

Uh oh!

Releases 5

Packages

Uh oh!

Contributors 6

Uh oh!

Languages

License

JuliaNeighbors/HNSW.jl

Folders and files

Latest commit

History

Repository files navigation

HNSW.jl

Highlights

Creating an Index

Searching

A simple example:

Multi-Threading

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors 6

Uh oh!

Languages

Packages