Overview
LangChain’s streaming system lets you surface live feedback from agent runs to your application. What’s possible with LangChain streaming:- Stream agent progress — get state updates after each agent step.
- Stream LLM tokens — stream language model tokens as they’re generated.
- Stream custom updates — emit user-defined signals (e.g.,
"Fetched 10/100 records"). - Stream multiple modes — choose from
updates(agent progress),messages(LLM tokens + metadata), orcustom(arbitrary user data).
Supported stream modes
Pass one or more of the following stream modes as a list to thestream or astream methods:
| Mode | Description |
|---|---|
updates | Streams state updates after each agent step. If multiple updates are made in the same step (e.g., multiple nodes are run), those updates are streamed separately. |
messages | Streams tuples of (token, metadata) from any graph nodes where an LLM is invoked. |
custom | Streams custom data from inside your graph nodes using the stream writer. |
Agent progress
To stream agent progress, use thestream or astream methods with stream_mode="updates". This emits an event after every agent step.
For example, if you have an agent that calls a tool once, you should see the following updates:
- LLM node:
AIMessagewith tool call requests - Tool node:
ToolMessagewith execution result - LLM node: Final AI response
Streaming agent progress
Output
LLM tokens
To stream tokens as they are produced by the LLM, usestream_mode="messages". Below you can see the output of the agent streaming tool calls and the final response.
Streaming LLM tokens
Output
Custom updates
To stream updates from tools as they are executed, you can useget_stream_writer.
Streaming custom updates
Output
If you add
get_stream_writer inside your tool, you won’t be able to invoke the tool outside of a LangGraph execution context.Stream multiple modes
You can specify multiple streaming modes by passing stream mode as a list:stream_mode=["updates", "custom"].
The streamed outputs will be tuples of (mode, chunk) where mode is the name of the stream mode and chunk is the data streamed by that mode.
Streaming multiple modes
Output
Common patterns
Below are examples showing common use cases for streaming.Streaming tool calls
You may want to stream both:- Partial JSON as tool calls are generated
- The completed, parsed tool calls that are executed
stream_mode="messages" will stream incremental message chunks generated by all LLM calls in the agent. To access the completed messages with parsed tool calls:
- If those messages are tracked in the state (as in the model node of
create_agent), usestream_mode=["messages", "updates"]to access completed messages through state updates (demonstrated below). - If those messages are not tracked in the state, use custom updates or aggregate the chunks during the streaming loop (next section).
Refer to the section below on streaming from sub-agents if your agent includes multiple LLMs.
Output
Accessing completed messages
In some cases, completed messages are not reflected in state updates. If you have access to the agent internals, you can use custom updates to access these messages during streaming. Otherwise, you can aggregate message chunks in the streaming loop (see below).
Consider the below example, where we incorporate a stream writer into a simplified guardrail middleware. This middleware demonstrates tool calling to generate a structured “safe / unsafe” evaluation (one could also use structured outputs for this):
Output
Streaming with human-in-the-loop
To handle human-in-the-loop interrupts, we build on the above example:- We configure the agent with human-in-the-loop middleware and a checkpointer
- We collect interrupts generated during the
"updates"stream mode - We respond to those interrupts with a command
Output
Output
Output
Streaming from sub-agents
When there are multiple LLMs at any point in an agent, it’s often necessary to disambiguate the source of messages as they are generated. To do this, you can initialize any model withtags. These tags are then available in metadata when streaming in "messages" mode.
Below, we update the streaming tool calls example:
- We replace our tool with a
call_weather_agenttool that invokes an agent internally - We add a string tag to this LLM and the outer “supervisor” LLM
- We specify
subgraphs=Truewhen creating the stream - Our stream processing is identical to before, but we add logic to keep track of what LLM is active
Output
Disable streaming
In some applications you might need to disable streaming of individual tokens for a given model. This is useful when:- Working with multi-agent systems to control which agents stream their output
- Mixing models that support streaming with those that do not
- Deploying to LangSmith and wanting to prevent certain model outputs from being streamed to the client
streaming=False when initializing the model.
Not all chat model integrations support the
streaming parameter. If your model doesn’t support it, use disable_streaming=True instead. This parameter is available on all chat models via the base class.Related
- Streaming with chat models — Stream tokens directly from a chat model without using an agent or graph
- Streaming with human-in-the-loop — Stream agent progress while handling interrupts for human review
- LangGraph streaming — Advanced streaming options including
values,debugmodes, and subgraph streaming