-
Notifications
You must be signed in to change notification settings - Fork 20.3k
Description
Checked other resources
- This is a bug, not a usage question.
- I added a clear and descriptive title that summarizes this issue.
- I used the GitHub search to find a similar question and didn't find it.
- I am sure that this is a bug in LangChain rather than my code.
- The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
- This is not related to the langchain-community package.
- I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.
Package (Required)
- langchain
- langchain-openai
- langchain-anthropic
- langchain-classic
- langchain-core
- langchain-cli
- langchain-model-profiles
- langchain-tests
- langchain-text-splitters
- langchain-chroma
- langchain-deepseek
- langchain-exa
- langchain-fireworks
- langchain-groq
- langchain-huggingface
- langchain-mistralai
- langchain-nomic
- langchain-ollama
- langchain-perplexity
- langchain-prompty
- langchain-qdrant
- langchain-xai
- Other / not sure / general
Example Code (Python)
from langchain.agents.middleware import SummarizationMiddleware
from langchain.agents import create_agent
summary_middleware = SummarizationMiddleware(
model=summary_llm,
max_tokens_before_summary=512,
messages_to_keep=0,
)
agent = create_agent(
name="test_summary_agent",
model=model,
middleware=[
summary_middleware,
],
)
model_input = "your test input " # token length should > 1024, make sure summary_middleware will work
result = agent.invoke(input={"messages": [{"role": "user", "content": model_input}]})Error Message and Stack Trace (if applicable)
Error code: 400 - {'error': {'message': "This model's maximum context length is 262144 tokens. However, your request has 272099 input tokens. Please reduce the length of the input messages. None", 'type': 'BadRequestError', 'param': None, 'code': 400}}
-------------------
In fact, the SummarizationMiddleware's summary input is this (contain unrelated model usage message):
<role>
Context Extraction Assistant
</role>
<primary_objective>
Your sole objective in this task is to extract the highest quality/most relevant context from the conversation history below.
</primary_objective>
<objective_information>
You're nearing the total number of input tokens you can accept, so you must extract the highest quality/most relevant pieces of information from your conversation history.
This context will then overwrite the conversation history presented below. Because of this, ensure the context you extract is only the most important information to your overall goal.
</objective_information>
<instructions>
The conversation history below will be replaced with the context you extract in this step. Because of this, you must do your very best to extract and record all of the most important context from the conversation history.
You want to ensure that you don't repeat any actions you've already completed, so the context you extract from the conversation history should be focused on the most important information to your overall goal.
</instructions>
The user will message you with the full message history you'll be extracting context from, to then replace. Carefully read over it all, and think deeply about what information is most important to your overall goal that should be saved:
With all of this in mind, please carefully read over the entire conversation history, and extract the most important and relevant context to replace it so that you can free up space in the conversation history.
Respond ONLY with the extracted context. Do not include any additional information, or text before or after the extracted context.
<messages>
Messages to summarize:
[HumanMessage(content="在网络上搜索与'死锁'相关的技术文档、故障案例和处理指南,重点关注反汇编层面的分析方法和调试手段,提取典型故障现象、定位步骤、解决方案等信息。", additional_kwargs={}, response_metadata={}, id='f73acfae-f4e4-4709-8f7c-b9f8d7647ff6'), AIMessage(content='', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 27, 'prompt_tokens': 5512, 'total_tokens': 5539, 'completion_tokens_details': None, 'prompt_tokens_details': None}, 'model_provider': 'openai', 'model_name': 'Qwen3-235B-A22B', 'system_fingerprint': None, 'id': 'chatcmpl-88b48347-4b11-48a9-8042-fc7753b6c65f', 'finish_reason': 'tool_calls', 'logprobs': None}, id='lc_run--019b5eb5-8bd7-7611-8076-f60398467090-0', tool_calls=[{'name': 'search_pages', 'args': {'keyword': '死锁', 'in_space': False}, 'id': 'chatcmpl-tool-99d91ac2a6b2785e', 'type': 'tool_call'}], usage_metadata={'input_tokens': 5512, 'output_tokens': 27, 'total_tokens': 5539, 'input_token_details': {}, 'output_token_details': {}}),Description
when the summary_middleware work(input messages token length > 1024), it will give the model input with DEFAULT_SUMMARY_PROMPT in "from langchain.agents.middleware import SummarizationMiddleware" like this:
Context Extraction Assistant<primary_objective>
Your sole objective in this task is to extract the highest quality/most relevant context from the conversation history below.
</primary_objective>
<objective_information>
You're nearing the total number of input tokens you can accept, so you must extract the highest quality/most relevant pieces of information from your conversation history.
This context will then overwrite the conversation history presented below. Because of this, ensure the context you extract is only the most important information to your overall goal.
</objective_information>
The user will message you with the full message history you'll be extracting context from, to then replace. Carefully read over it all, and think deeply about what information is most important to your overall goal that should be saved:
With all of this in mind, please carefully read over the entire conversation history, and extract the most important and relevant context to replace it so that you can free up space in the conversation history.
Respond ONLY with the extracted context. Do not include any additional information, or text before or after the extracted context.
WARN:
the "{messages}" in this prompt will be like:
HumanMessage(content="XXXXXXXX", additional_kwargs={}, response_metadata={}, id='f73acfae-f4e4-4709-8f7c-b9f8d7647ff6'), AIMessage(content='', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 27, 'prompt_tokens': 5512, 'total_tokens': 5539, 'completion_tokens_details': None, 'prompt_tokens_details': None}, 'model_provider': 'openai', 'model_name': 'Qwen3-235B-A22B', 'system_fingerprint': None, 'id': 'chatcmpl-88b48347-4b11-48a9-8042-fc7753b6c65f', 'finish_reason': 'tool_calls', 'logprobs': None}, id='lc_run--019b5eb5-8bd7-7611-8076-f60398467090-0', tool_calls=[{'name': 'search_pages', 'args': {'keyword': 'XXXXX', 'in_space': False}, 'id': 'chatcmpl-tool-99d91ac2a6b2785e', 'type': 'tool_call'}], usage_metadata={'input_tokens': 5512, 'output_tokens': 27, 'total_tokens': 5539, 'input_token_details': {}, 'output_token_details': {}}),
ERROR:
When the length of the messages token is very close to the maximum context input limit, any additional and irrelevant model usage information (such as tool call or usage statistics) can cause the total token count to exceed the limit, resulting in a context length error and causing the agent execution to fail!
I expected its output to be a correct summary of the agent's historical context, but in reality, it exceeds the maximum length and throws an error.
So I think it's a bug that need to fix.
System Info
System Information
OS: Linux
OS Version: #1 SMP Wed Jun 10 09:04:49 EDT 2020
Python Version: 3.11.3 (main, Apr 19 2023, 23:54:32) [GCC 11.2.0]
Package Information
langchain_core: 1.2.4
langchain: 1.2.0
langsmith: 0.4.43
langchain_anthropic: 1.3.0
langchain_mcp_adapters: 0.1.13
langchain_openai: 1.0.3
langgraph_api: 0.5.23
langgraph_cli: 0.4.7
langgraph_runtime_inmem: 0.18.1
langgraph_sdk: 0.2.9
Optional packages not installed
langserve
Other Dependencies
anthropic: 0.75.0
blockbuster: 1.5.25
click: 8.3.1
cloudpickle: 3.1.2
cryptography: 44.0.3
grpcio: 1.76.0
grpcio-tools: 1.75.1
httpx: 0.28.1
jsonpatch: 1.33
jsonschema-rs: 0.29.1
langgraph: 1.0.3
langgraph-checkpoint: 3.0.1
mcp: 1.21.2
openai: 2.8.1
opentelemetry-api: 1.38.0
opentelemetry-exporter-otlp-proto-http: 1.38.0
opentelemetry-sdk: 1.38.0
orjson: 3.11.4
packaging: 25.0
protobuf: 6.33.1
pydantic: 2.11.10
pyjwt: 2.10.1
pytest: 7.3.1
python-dotenv: 1.2.1
pyyaml: 6.0.3
requests: 2.32.5
requests-toolbelt: 1.0.0
rich: 14.2.0
sse-starlette: 2.1.3
starlette: 0.49.3
structlog: 25.5.0
tenacity: 8.2.2
tiktoken: 0.12.0
truststore: 0.10.4
typing-extensions: 4.15.0
uuid-utils: 0.12.0
uvicorn: 0.38.0
watchfiles: 1.1.1
zstandard: 0.25.0