Skip to content

[Go] Streaming with tools causes messages loss #3851

@hugoaguirre

Description

@hugoaguirre

When performing a streaming Generate() call that includes tools, the final response from the model only includes the tool response ignoring the reasoning or previous messages generated by the model.

Code to reproduce the issue

var streamedOutput string

final, err := genkit.Generate(ctx, g,
	ai.WithPrompt("what is a gablorken of value 2 over 3?"),
	ai.WithTools(gablorkenTool),
	ai.WithStreaming(func(ctx context.Context, chunk *ai.ModelResponseChunk) error {
		for _, content := range chunk.Content {
			streamedOutput += content.Text
		}
		return nil
	}))
if err != nil {
	t.Fatal(err)
}

// Verify final output matches streamed content
finalOutput := final.Text()
if streamedOutput != finalOutput {
	t.Errorf("Streaming output doesn't match final output\nStreamed: %s\nFinal: %s",
		streamedOutput, finalOutput)
}

// Output
Streamed: I can help you calculate the gablorken! Based on your question, you want to calculate a gablorken with value 2 over 3. Let me use the gablorken calculation tool for you. The gablorken of value 2 over 3 is 8.

Final: The gablorken of value 2 over 3 is 8.

In the streamed response, the received chunks include the messages prior the tool call execution and the tool response in the final generation.

In this case, only the model messages are missing but it is the same situation for reasoning messages or intermediate tool responses messages. In other words, the model replies back with the last response from the last generate call.

Root cause

The roots of this issue can be found here

  1. Initially, the resp variable contains the original response message from the model prior the tool execution (here).
  2. Then, the flow continues until reaching the point where Genkit needs to see if there are tool requests to be handled (here)
  3. If there are no tools needed, the generate() call returns with the original response (here)
  4. But if there were tools that had to be handled, a new generate() request gets triggered with a new request message (here).
  5. The return value contains only the response messages from that last generate call, omitting the original resp messages (thoughts or model messages).

Metadata

Metadata

Assignees

Labels

bugSomething isn't workinggo

Type

No type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions