Skip to content

Stefan-Coetzee/langchain-postgres

 
 

Repository files navigation

langchain-postgres

Release Notes CI License: MIT Twitter Open Issues

The langchain-postgres package implementations of core LangChain abstractions using Postgres.

The package is released under the MIT license.

Feel free to use the abstraction as provided or else modify them / extend them as appropriate for your own application.

Requirements

The package currently only supports the psycogp3 driver.

Installation

pip install -U langchain-postgres

Change Log

0.0.6:

  • Remove langgraph as a dependency as it was causing dependency conflicts.
  • Base interface for checkpointer changed in langgraph, so existing implementation would've broken regardless.

Usage

HNSW index

from langchain_postgres import PGVector, EmbeddingIndexType

PGVector(
    collection_name="test_collection",
    embeddings=FakeEmbedding(),
    connection=CONNECTION_STRING,
    embedding_length=1536,
    embedding_index=EmbeddingIndexType.hnsw,
    embedding_index_ops="vector_cosine_ops",
)
  • Embedding length is required for HNSW index.
  • Allowed values for embedding_index_ops are described in the pgvector HNSW.

Can set ef_construction and m parameters for HNSW index. Refer to the pgvector HNSW Index Options to better understand these parameters.

from langchain_postgres import PGVector, EmbeddingIndexType

PGVector(
    collection_name="test_collection",
    embeddings=FakeEmbedding(),
    connection=CONNECTION_STRING,
    embedding_length=1536,
    embedding_index=EmbeddingIndexType.hnsw,
    embedding_index_ops="vector_cosine_ops",
    ef_construction=200,
    m=48,
)

IVFFlat index

from langchain_postgres import PGVector, EmbeddingIndexType

PGVector(
    collection_name="test_collection",
    embeddings=FakeEmbedding(),
    connection=CONNECTION_STRING,
    embedding_length=1536,
    embedding_index=EmbeddingIndexType.ivfflat,
    embedding_index_ops="vector_cosine_ops",
)
  • Embedding length is required for HNSW index.
  • Allowed values for embedding_index_ops are described in the pgvector IVFFlat.

Binary Quantization

from langchain_postgres import PGVector, EmbeddingIndexType

PGVector(
    collection_name="test_collection",
    embeddings=FakeEmbedding(),
    connection=CONNECTION_STRING,
    embedding_length=1536,
    embedding_index=EmbeddingIndexType.hnsw,
    embedding_index_ops="bit_hamming_ops",
    binary_quantization=True,
    binary_limit=200,
)
  • Works only with HNSW index with bit_hamming_ops.
  • binary_limit increases the limit in the inner binary search. A higher value will increase the recall at the cost of speed.

Refer to the pgvector Binary Quantization to better understand.

Partitioning

from langchain_postgres import PGVector

PGVector(
    collection_name="test_collection",
    embeddings=FakeEmbedding(),
    connection=CONNECTION_STRING,
    enable_partitioning=True,
)
  • Create partitions of langchain_pg_embedding table by collection_id. Useful with a large number of embeddings with different collection.

Refer to the pgvector Partitioning

Iterative Scan

from langchain_postgres import PGVector, EmbeddingIndexType, IterativeScan

PGVector(
    collection_name="test_collection",
    embeddings=FakeEmbedding(),
    connection=CONNECTION_STRING,
    embedding_length=1536,
    embedding_index=EmbeddingIndexType.hnsw,
    embedding_index_ops="vector_cosine_ops",
    iterative_scan=IterativeScan.relaxed_order
)
  • iterative_scan can be set to IterativeScan.relaxed_order or IterativeScan.strict_order or disabled with IterativeScan.off.
  • Requires an HNSW or IVFFlat index.

Refer to the pgvector Iterative Scan to better understand.

Iterative Scan Options for HNSW index

from langchain_postgres import PGVector, EmbeddingIndexType, IterativeScan

PGVector(
    collection_name="test_collection",
    embeddings=FakeEmbedding(),
    connection=CONNECTION_STRING,
    embedding_length=1536,
    embedding_index=EmbeddingIndexType.hnsw,
    embedding_index_ops="vector_cosine_ops",
    iterative_scan=IterativeScan.relaxed_order,
    max_scan_tuples=40000,
    scan_mem_multiplier=2
)
  • max_scan_tuples control when the scan ends when iterative_scan is enabled.
  • scan_mem_multiplier specify the max amount of memory to use for the scan.

Refer to the pgvector Iterative Scan Options to better understand.

Full Text Search

Can be used by specifying full_text_search parameter.

from langchain_postgres import PGVector

vectorstore = PGVector(
    collection_name="test_collection",
    embeddings=FakeEmbedding(),
    connection=CONNECTION_STRING,
)

vectorstore.similarity_search(
    "hello world",
    full_text_search=["foo", "bar & baz"]
)

This adds the following statement to the WHERE clause:

AND document_vector @@ to_tsquery('foo | bar & baz')

Can be used with retrievers like this:

from langchain_postgres import PGVector

vectorstore = PGVector(
    collection_name="test_collection",
    embeddings=FakeEmbedding(),
    connection=CONNECTION_STRING,
)

retriever = vectorstore.as_retriever(
    search_kwargs={
        "full_text_search": ["foo", "bar & baz"]
    }
)

Refer to Postgres Full Text Search for more information.

ChatMessageHistory

The chat message history abstraction helps to persist chat message history in a postgres table.

PostgresChatMessageHistory is parameterized using a table_name and a session_id.

The table_name is the name of the table in the database where the chat messages will be stored.

The session_id is a unique identifier for the chat session. It can be assigned by the caller using uuid.uuid4().

import uuid

from langchain_core.messages import SystemMessage, AIMessage, HumanMessage
from langchain_postgres import PostgresChatMessageHistory
import psycopg

# Establish a synchronous connection to the database
# (or use psycopg.AsyncConnection for async)
conn_info = ... # Fill in with your connection info
sync_connection = psycopg.connect(conn_info)

# Create the table schema (only needs to be done once)
table_name = "chat_history"
PostgresChatMessageHistory.create_tables(sync_connection, table_name)

session_id = str(uuid.uuid4())

# Initialize the chat history manager
chat_history = PostgresChatMessageHistory(
    table_name,
    session_id,
    sync_connection=sync_connection
)

# Add messages to the chat history
chat_history.add_messages([
    SystemMessage(content="Meow"),
    AIMessage(content="woof"),
    HumanMessage(content="bark"),
])

print(chat_history.messages)

Vectorstore

See example for the PGVector vectorstore here

About

LangChain abstractions backed by Postgres Backend with enhanced features

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.2%
  • Makefile 0.8%