Skip to content

Commit 294f600

Browse files
jeasonnowjeasonnowjacoblee93
authored
langchain[minor]: add EnsembleRetriever (#5556)
* langchain[patch]: add support to merge retrievers * Format * Parallelize, lint, format, small fixes * Add entrypoint * Fix import * Add docs and fix build artifacts --------- Co-authored-by: jeasonnow <guyf@seeyon.com> Co-authored-by: jacoblee93 <jacoblee93@gmail.com>
1 parent d35d12d commit 294f600

File tree

17 files changed

+349
-2
lines changed

17 files changed

+349
-2
lines changed

‎docs/core_docs/docs/concepts.mdx‎

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -672,8 +672,9 @@ LangChain provides several advanced retrieval types. A full list is below, along
672672
| [Multi Vector](/docs/how_to/multi_vector/) | Vectorstore + Document Store | Sometimes during indexing | If you are able to extract information from documents that you think is more relevant to index than the text itself. | This involves creating multiple vectors for each document. Each vector could be created in a myriad of ways - examples include summaries of the text and hypothetical questions. |
673673
| [Self Query](/docs/how_to/self_query/) | Vectorstore | Yes | If users are asking questions that are better answered by fetching documents based on metadata rather than similarity with the text. | This uses an LLM to transform user input into two things: (1) a string to look up semantically, (2) a metadata filer to go along with it. This is useful because oftentimes questions are about the METADATA of documents (not the content itself). |
674674
| [Contextual Compression](/docs/how_to/contextual_compression/) | Any | Sometimes | If you are finding that your retrieved documents contain too much irrelevant information and are distracting the LLM. | This puts a post-processing step on top of another retriever and extracts only the most relevant information from retrieved documents. This can be done with embeddings or an LLM. |
675-
| [Time-Weighted Vectorstore](/docs/how_to/time_weighted_vectorstore/) | Vectorstore | No | If you have timestamps associated with your documents, and you want to retrieve the most recent ones | This fetches documents based on a combination of semantic similarity (as in normal vector retrieval) and recency (looking at timestamps of indexed documents) |
676-
| [Multi-Query Retriever](/docs/how_to/multiple_queries/) | Any | Yes | If users are asking questions that are complex and require multiple pieces of distinct information to respond | This uses an LLM to generate multiple queries from the original one. This is useful when the original query needs pieces of information about multiple topics to be properly answered. By generating multiple queries, we can then fetch documents for each of them. |
675+
| [Time-Weighted Vectorstore](/docs/how_to/time_weighted_vectorstore/) | Vectorstore | No | If you have timestamps associated with your documents, and you want to retrieve the most recent ones. | This fetches documents based on a combination of semantic similarity (as in normal vector retrieval) and recency (looking at timestamps of indexed documents) |
676+
| [Multi-Query Retriever](/docs/how_to/multiple_queries/) | Any | Yes | If users are asking questions that are complex and require multiple pieces of distinct information to respond. | This uses an LLM to generate multiple queries from the original one. This is useful when the original query needs pieces of information about multiple topics to be properly answered. By generating multiple queries, we can then fetch documents for each of them. |
677+
| [Ensemble](/docs/how_to/ensemble_retriever) | Any | No | If you have multiple retrieval methods and want to try combining them. | This fetches documents from multiple retrievers and then combines them. |
677678

678679
### Text splitting
679680

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
# How to combine results from multiple retrievers
2+
3+
:::info Prerequisites
4+
5+
This guide assumes familiarity with the following concepts:
6+
7+
- [Documents](/docs/concepts#document)
8+
- [Retrievers](/docs/concepts#retrievers)
9+
10+
:::
11+
12+
The [EnsembleRetriever](https://api.js.langchain.com/classes/langchain_retrievers_ensemble.EnsembleRetriever.html) supports ensembling of results from multiple retrievers. It is initialized with a list of [BaseRetriever](https://api.js.langchain.com/classes/langchain_core_retrievers.BaseRetriever.html) objects. EnsembleRetrievers rerank the results of the constituent retrievers based on the [Reciprocal Rank Fusion](https://plg.uwaterloo.ca/~gvcormac/cormacksigir09-rrf.pdf) algorithm.
13+
14+
By leveraging the strengths of different algorithms, the `EnsembleRetriever` can achieve better performance than any single algorithm.
15+
16+
One useful pattern is to combine a keyword matching retriever with a dense retriever (like embedding similarity), because their strengths are complementary. This can be considered a form of "hybrid search". The sparse retriever is good at finding relevant documents based on keywords, while the dense retriever is good at finding relevant documents based on semantic similarity.
17+
18+
Below we demonstrate ensembling of a [simple custom retriever](/docs/how_to/custom_retriever/) that simply returns documents that directly contain the input query with a retriever derived from a [demo, in-memory, vector store](https://api.js.langchain.com/classes/langchain_vectorstores_memory.MemoryVectorStore.html).
19+
20+
import CodeBlock from "@theme/CodeBlock";
21+
import Example from "@examples/retrievers/ensemble_retriever.ts";
22+
23+
<CodeBlock language="typescript">{Example}</CodeBlock>
24+
25+
## Next steps
26+
27+
You've now learned how to combine results from multiple retrievers.
28+
Next, check out some other retrieval how-to guides, such as how to [improve results using multiple embeddings per document](/docs/how_to/multi_vector)
29+
or how to [create your own custom retriever](/docs/how_to/custom_retriever).

‎docs/core_docs/docs/how_to/index.mdx‎

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,6 +133,7 @@ Retrievers are responsible for taking a query and returning relevant documents.
133133
- [How to: generate multiple queries to retrieve data for](/docs/how_to/multiple_queries)
134134
- [How to: use contextual compression to compress the data retrieved](/docs/how_to/contextual_compression)
135135
- [How to: write a custom retriever class](/docs/how_to/custom_retriever)
136+
- [How to: combine the results from multiple retrievers](/docs/how_to/ensemble_retriever)
136137
- [How to: generate multiple embeddings per document](/docs/how_to/multi_vector)
137138
- [How to: retrieve the whole document for a chunk](/docs/how_to/parent_document_retriever)
138139
- [How to: generate metadata filters](/docs/how_to/self_query)

‎environment_tests/test-exports-bun/src/entrypoints.js‎

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ export * from "langchain/callbacks";
3737
export * from "langchain/output_parsers";
3838
export * from "langchain/retrievers/contextual_compression";
3939
export * from "langchain/retrievers/document_compressors";
40+
export * from "langchain/retrievers/ensemble";
4041
export * from "langchain/retrievers/multi_query";
4142
export * from "langchain/retrievers/multi_vector";
4243
export * from "langchain/retrievers/parent_document";

‎environment_tests/test-exports-cf/src/entrypoints.js‎

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ export * from "langchain/callbacks";
3737
export * from "langchain/output_parsers";
3838
export * from "langchain/retrievers/contextual_compression";
3939
export * from "langchain/retrievers/document_compressors";
40+
export * from "langchain/retrievers/ensemble";
4041
export * from "langchain/retrievers/multi_query";
4142
export * from "langchain/retrievers/multi_vector";
4243
export * from "langchain/retrievers/parent_document";

‎environment_tests/test-exports-cjs/src/entrypoints.js‎

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ const callbacks = require("langchain/callbacks");
3737
const output_parsers = require("langchain/output_parsers");
3838
const retrievers_contextual_compression = require("langchain/retrievers/contextual_compression");
3939
const retrievers_document_compressors = require("langchain/retrievers/document_compressors");
40+
const retrievers_ensemble = require("langchain/retrievers/ensemble");
4041
const retrievers_multi_query = require("langchain/retrievers/multi_query");
4142
const retrievers_multi_vector = require("langchain/retrievers/multi_vector");
4243
const retrievers_parent_document = require("langchain/retrievers/parent_document");

‎environment_tests/test-exports-esbuild/src/entrypoints.js‎

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ import * as callbacks from "langchain/callbacks";
3737
import * as output_parsers from "langchain/output_parsers";
3838
import * as retrievers_contextual_compression from "langchain/retrievers/contextual_compression";
3939
import * as retrievers_document_compressors from "langchain/retrievers/document_compressors";
40+
import * as retrievers_ensemble from "langchain/retrievers/ensemble";
4041
import * as retrievers_multi_query from "langchain/retrievers/multi_query";
4142
import * as retrievers_multi_vector from "langchain/retrievers/multi_vector";
4243
import * as retrievers_parent_document from "langchain/retrievers/parent_document";

‎environment_tests/test-exports-esm/src/entrypoints.js‎

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ import * as callbacks from "langchain/callbacks";
3737
import * as output_parsers from "langchain/output_parsers";
3838
import * as retrievers_contextual_compression from "langchain/retrievers/contextual_compression";
3939
import * as retrievers_document_compressors from "langchain/retrievers/document_compressors";
40+
import * as retrievers_ensemble from "langchain/retrievers/ensemble";
4041
import * as retrievers_multi_query from "langchain/retrievers/multi_query";
4142
import * as retrievers_multi_vector from "langchain/retrievers/multi_vector";
4243
import * as retrievers_parent_document from "langchain/retrievers/parent_document";

‎environment_tests/test-exports-vercel/src/entrypoints.js‎

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ export * from "langchain/callbacks";
3737
export * from "langchain/output_parsers";
3838
export * from "langchain/retrievers/contextual_compression";
3939
export * from "langchain/retrievers/document_compressors";
40+
export * from "langchain/retrievers/ensemble";
4041
export * from "langchain/retrievers/multi_query";
4142
export * from "langchain/retrievers/multi_vector";
4243
export * from "langchain/retrievers/parent_document";

‎environment_tests/test-exports-vite/src/entrypoints.js‎

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,7 @@ export * from "langchain/callbacks";
3737
export * from "langchain/output_parsers";
3838
export * from "langchain/retrievers/contextual_compression";
3939
export * from "langchain/retrievers/document_compressors";
40+
export * from "langchain/retrievers/ensemble";
4041
export * from "langchain/retrievers/multi_query";
4142
export * from "langchain/retrievers/multi_vector";
4243
export * from "langchain/retrievers/parent_document";

0 commit comments

Comments
 (0)