Skip to content

Commit c22ad54

Browse files
lgrammelsamdenty
andcommitted
feat (core): add chunking functions support to smoothStream (#5548)
Co-authored-by: Sam Denty <sam@samdenty.com> Co-authored-by: Sam Denty <samddenty@gmail.com>
1 parent a4f3007 commit c22ad54

File tree

9 files changed

+404
-18
lines changed

9 files changed

+404
-18
lines changed

‎.changeset/beige-ligers-kneel.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
'ai': patch
3+
---
4+
5+
feat(smooth-stream): chunking callbacks
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
---
2+
title: Smooth streaming japanese text
3+
description: Learn how to stream smooth stream japanese text
4+
---
5+
6+
# Smooth streaming japanese text
7+
8+
You can smooth stream japanese text by using the `smoothStream` function, and the following regex that splits either on words of japanese characters:
9+
10+
```tsx filename="page.tsx"
11+
import { smoothStream } from 'ai';
12+
import { useChat } from '@ai-sdk/react';
13+
14+
const { data } = useChat({
15+
experimental_transform: smoothStream({
16+
chunking: /[\u3040-\u309F\u30A0-\u30FF]|\S+\s+/,
17+
}),
18+
});
19+
```
Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
---
2+
title: Smooth streaming chinese text
3+
description: Learn how to stream smooth stream chinese text
4+
---
5+
6+
# Smooth streaming chinese text
7+
8+
You can smooth stream chinese text by using the `smoothStream` function, and the following regex that splits either on words of chinese characters:
9+
10+
```tsx filename="page.tsx"
11+
import { smoothStream } from 'ai';
12+
import { useChat } from '@ai-sdk/react';
13+
14+
const { data } = useChat({
15+
experimental_transform: smoothStream({
16+
chunking: /[\u4E00-\u9FFF]|\S+\s+/,
17+
}),
18+
});
19+
```

‎content/docs/07-reference/01-ai-sdk-core/80-smooth-stream.mdx

Lines changed: 55 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -42,14 +42,67 @@ const result = streamText({
4242
},
4343
{
4444
name: 'chunking',
45-
type: '"word" | "line" | RegExp',
45+
type: '"word" | "line" | RegExp | (buffer: string) => string | undefined | null',
4646
isOptional: true,
4747
description:
48-
'Controls how the text is chunked for streaming. Use "word" to stream word by word (default), "line" to stream line by line, or provide a custom RegExp pattern for custom chunking.',
48+
'Controls how the text is chunked for streaming. Use "word" to stream word by word (default), "line" to stream line by line, or provide a custom callback or RegExp pattern for custom chunking.',
4949
},
5050
]}
5151
/>
5252

53+
#### Word chunking caveats with non-latin languages
54+
55+
<Note>
56+
The word based chunking **does not work well** with the following languages that do not delimit words with spaces:
57+
58+
For these languages we recommend using a custom regex, like the following:
59+
60+
- Chinese - `/[\u4E00-\u9FFF]|\S+\s+/`
61+
- Japanese - `/[\u3040-\u309F\u30A0-\u30FF]|\S+\s+/`
62+
63+
For these languages you could pass your own language aware chunking function:
64+
65+
- Vietnamese
66+
- Thai
67+
- Javanese (Aksara Jawa)
68+
69+
</Note>
70+
71+
#### Regex based chunking
72+
73+
To use regex based chunking, pass a `RegExp` to the `chunking` option.
74+
75+
```ts
76+
// To split on underscores:
77+
smoothStream({
78+
chunking: /_+/,
79+
});
80+
81+
// Also can do it like this, same behavior
82+
smoothStream({
83+
chunking: /[^_]*_/,
84+
});
85+
```
86+
87+
#### Custom callback chunking
88+
89+
To use a custom callback for chunking, pass a function to the `chunking` option.
90+
91+
```ts
92+
smoothStream({
93+
chunking: text => {
94+
const findString = 'some string';
95+
const index = text.indexOf(findString);
96+
97+
if (index === -1) {
98+
return null;
99+
}
100+
101+
return text.slice(0, index) + findString;
102+
},
103+
});
104+
```
105+
53106
### Returns
54107

55108
Returns a `TransformStream` that:
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
import { simulateReadableStream, smoothStream, streamText } from 'ai';
2+
import { MockLanguageModelV1 } from 'ai/test';
3+
4+
async function main() {
5+
const result = streamText({
6+
model: new MockLanguageModelV1({
7+
doStream: async () => ({
8+
stream: simulateReadableStream({
9+
chunks: [
10+
{ type: 'text-delta', textDelta: '你好你好你好你好你好' },
11+
{ type: 'text-delta', textDelta: '你好你好你好你好你好' },
12+
{ type: 'text-delta', textDelta: '你好你好你好你好你好' },
13+
{ type: 'text-delta', textDelta: '你好你好你好你好你好' },
14+
{ type: 'text-delta', textDelta: '你好你好你好你好你好' },
15+
{
16+
type: 'finish',
17+
finishReason: 'stop',
18+
logprobs: undefined,
19+
usage: { completionTokens: 10, promptTokens: 3 },
20+
},
21+
],
22+
chunkDelayInMs: 400,
23+
}),
24+
rawCall: { rawPrompt: null, rawSettings: {} },
25+
}),
26+
}),
27+
28+
prompt: 'Say hello in Chinese!',
29+
experimental_transform: smoothStream({
30+
chunking: /[\u4E00-\u9FFF]|\S+\s+/,
31+
}),
32+
});
33+
34+
for await (const textPart of result.textStream) {
35+
process.stdout.write(textPart);
36+
}
37+
}
38+
39+
main().catch(console.error);
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
import { simulateReadableStream, smoothStream, streamText } from 'ai';
2+
import { MockLanguageModelV1 } from 'ai/test';
3+
4+
async function main() {
5+
const result = streamText({
6+
model: new MockLanguageModelV1({
7+
doStream: async () => ({
8+
stream: simulateReadableStream({
9+
chunks: [
10+
{ type: 'text-delta', textDelta: 'こんにちは' },
11+
{ type: 'text-delta', textDelta: 'こんにちは' },
12+
{ type: 'text-delta', textDelta: 'こんにちは' },
13+
{ type: 'text-delta', textDelta: 'こんにちは' },
14+
{ type: 'text-delta', textDelta: 'こんにちは' },
15+
{
16+
type: 'finish',
17+
finishReason: 'stop',
18+
logprobs: undefined,
19+
usage: { completionTokens: 10, promptTokens: 3 },
20+
},
21+
],
22+
chunkDelayInMs: 400,
23+
}),
24+
rawCall: { rawPrompt: null, rawSettings: {} },
25+
}),
26+
}),
27+
28+
prompt: 'Say hello in Japanese!',
29+
experimental_transform: smoothStream({
30+
chunking: /[\u3040-\u309F\u30A0-\u30FF]|\S+\s+/,
31+
}),
32+
});
33+
34+
for await (const textPart of result.textStream) {
35+
process.stdout.write(textPart);
36+
}
37+
}
38+
39+
main().catch(console.error);

‎packages/ai/core/generate-text/index.ts

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ export type {
66
GeneratedFile,
77
} from './generated-file';
88
export * as Output from './output';
9-
export { smoothStream } from './smooth-stream';
9+
export { smoothStream, type ChunkDetector } from './smooth-stream';
1010
export type { StepResult } from './step-result';
1111
export { streamText } from './stream-text';
1212
export type {

0 commit comments

Comments
 (0)