langchain-postgres

The langchain-postgres package implementations of core LangChain abstractions using Postgres.

The package is released under the MIT license.

Feel free to use the abstraction as provided or else modify them / extend them as appropriate for your own application.

Requirements

The package currently only supports the psycogp3 driver.

Installation

pip install -U langchain-postgres

Change Log

0.0.6:

Remove langgraph as a dependency as it was causing dependency conflicts.
Base interface for checkpointer changed in langgraph, so existing implementation would've broken regardless.

Usage

HNSW index

from langchain_postgres import PGVector, EmbeddingIndexType

PGVector(
    collection_name="test_collection",
    embeddings=FakeEmbedding(),
    connection=CONNECTION_STRING,
    embedding_length=1536,
    embedding_index=EmbeddingIndexType.hnsw,
    embedding_index_ops="vector_cosine_ops",
)

Embedding length is required for HNSW index.
Allowed values for embedding_index_ops are described in the pgvector HNSW.

Can set ef_construction and m parameters for HNSW index. Refer to the pgvector HNSW Index Options to better understand these parameters.

from langchain_postgres import PGVector, EmbeddingIndexType

PGVector(
    collection_name="test_collection",
    embeddings=FakeEmbedding(),
    connection=CONNECTION_STRING,
    embedding_length=1536,
    embedding_index=EmbeddingIndexType.hnsw,
    embedding_index_ops="vector_cosine_ops",
    ef_construction=200,
    m=48,
)

IVFFlat index

from langchain_postgres import PGVector, EmbeddingIndexType

PGVector(
    collection_name="test_collection",
    embeddings=FakeEmbedding(),
    connection=CONNECTION_STRING,
    embedding_length=1536,
    embedding_index=EmbeddingIndexType.ivfflat,
    embedding_index_ops="vector_cosine_ops",
)

Embedding length is required for HNSW index.
Allowed values for embedding_index_ops are described in the pgvector IVFFlat.

Binary Quantization

from langchain_postgres import PGVector, EmbeddingIndexType

PGVector(
    collection_name="test_collection",
    embeddings=FakeEmbedding(),
    connection=CONNECTION_STRING,
    embedding_length=1536,
    embedding_index=EmbeddingIndexType.hnsw,
    embedding_index_ops="bit_hamming_ops",
    binary_quantization=True,
    binary_limit=200,
)

Works only with HNSW index with bit_hamming_ops.
binary_limit increases the limit in the inner binary search. A higher value will increase the recall at the cost of speed.

Refer to the pgvector Binary Quantization to better understand.

Partitioning

from langchain_postgres import PGVector

PGVector(
    collection_name="test_collection",
    embeddings=FakeEmbedding(),
    connection=CONNECTION_STRING,
    enable_partitioning=True,
)

Create partitions of langchain_pg_embedding table by collection_id. Useful with a large number of embeddings with different collection.

Refer to the pgvector Partitioning

Iterative Scan

from langchain_postgres import PGVector, EmbeddingIndexType, IterativeScan

PGVector(
    collection_name="test_collection",
    embeddings=FakeEmbedding(),
    connection=CONNECTION_STRING,
    embedding_length=1536,
    embedding_index=EmbeddingIndexType.hnsw,
    embedding_index_ops="vector_cosine_ops",
    iterative_scan=IterativeScan.relaxed_order
)

iterative_scan can be set to IterativeScan.relaxed_order or IterativeScan.strict_order or disabled with IterativeScan.off.
Requires an HNSW or IVFFlat index.

Refer to the pgvector Iterative Scan to better understand.

Iterative Scan Options for HNSW index

from langchain_postgres import PGVector, EmbeddingIndexType, IterativeScan

PGVector(
    collection_name="test_collection",
    embeddings=FakeEmbedding(),
    connection=CONNECTION_STRING,
    embedding_length=1536,
    embedding_index=EmbeddingIndexType.hnsw,
    embedding_index_ops="vector_cosine_ops",
    iterative_scan=IterativeScan.relaxed_order,
    max_scan_tuples=40000,
    scan_mem_multiplier=2
)

max_scan_tuples control when the scan ends when iterative_scan is enabled.
scan_mem_multiplier specify the max amount of memory to use for the scan.

Refer to the pgvector Iterative Scan Options to better understand.

Full Text Search

Can be used by specifying full_text_search parameter.

from langchain_postgres import PGVector

vectorstore = PGVector(
    collection_name="test_collection",
    embeddings=FakeEmbedding(),
    connection=CONNECTION_STRING,
)

vectorstore.similarity_search(
    "hello world",
    full_text_search=["foo", "bar & baz"]
)

This adds the following statement to the WHERE clause:

AND document_vector @@ to_tsquery('foo | bar & baz')

Can be used with retrievers like this:

from langchain_postgres import PGVector

vectorstore = PGVector(
    collection_name="test_collection",
    embeddings=FakeEmbedding(),
    connection=CONNECTION_STRING,
)

retriever = vectorstore.as_retriever(
    search_kwargs={
        "full_text_search": ["foo", "bar & baz"]
    }
)

Refer to Postgres Full Text Search for more information.

ChatMessageHistory

The chat message history abstraction helps to persist chat message history in a postgres table.

PostgresChatMessageHistory is parameterized using a table_name and a session_id.

The table_name is the name of the table in the database where the chat messages will be stored.

The session_id is a unique identifier for the chat session. It can be assigned by the caller using uuid.uuid4().

import uuid

from langchain_core.messages import SystemMessage, AIMessage, HumanMessage
from langchain_postgres import PostgresChatMessageHistory
import psycopg

# Establish a synchronous connection to the database
# (or use psycopg.AsyncConnection for async)
conn_info = ... # Fill in with your connection info
sync_connection = psycopg.connect(conn_info)

# Create the table schema (only needs to be done once)
table_name = "chat_history"
PostgresChatMessageHistory.create_tables(sync_connection, table_name)

session_id = str(uuid.uuid4())

# Initialize the chat history manager
chat_history = PostgresChatMessageHistory(
    table_name,
    session_id,
    sync_connection=sync_connection
)

# Add messages to the chat history
chat_history.add_messages([
    SystemMessage(content="Meow"),
    AIMessage(content="woof"),
    HumanMessage(content="bark"),
])

print(chat_history.messages)

Vectorstore

See example for the PGVector vectorstore here

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
.github		.github
examples		examples
langchain_postgres		langchain_postgres
tests		tests
.gitignore		.gitignore
.tool-versions		.tool-versions
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
security.md		security.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

langchain-postgres

Requirements

Installation

Change Log

Usage

HNSW index

IVFFlat index

Binary Quantization

Partitioning

Iterative Scan

Iterative Scan Options for HNSW index

Full Text Search

ChatMessageHistory

Vectorstore

About

Uh oh!

Releases

Packages

Languages

License

Stefan-Coetzee/langchain-postgres

Folders and files

Latest commit

History

Repository files navigation

langchain-postgres

Requirements

Installation

Change Log

Usage

HNSW index

IVFFlat index

Binary Quantization

Partitioning

Iterative Scan

Iterative Scan Options for HNSW index

Full Text Search

ChatMessageHistory

Vectorstore

About

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages