fix: persist swarm worker reports and improve summary extraction by iampengqian · Pull Request #84 · HKUDS/Vibe-Trading

iampengqian · 2026-05-09T03:36:30Z

Swarm workers frequently produced no usable output because:

The Execution Rules prompt asked agents to write summaries "directly in your response," but the worker only captured last_assistant_content (often a mid-process message like "let me fix the script").
On timeout/iteration limit, all in-memory messages were discarded.
Event logging recorded only tool names and elapsed time, not I/O.

Changes to worker.py:

Force agents to write final reports via write_file report.md (Phase 3), decoupling report persistence from the in-memory message lifecycle.
Add _best_summary() to extract the longest assistant message (>100 chars) instead of relying on the last fragment.
Add _persist_messages() to save full conversation to messages.json on timeout/iteration-limit for post-mortem debugging.
Include tool arguments in tool_call events and result content (up to 5000 chars) in tool_result events for better observability.

Changes to equity_research_team.yaml:

Increase timeout_seconds from 600 to 900 for all four agents, as the aggregator previously timed out at 615s.

Summary

Why

Changes

Test Plan

Existing tests pass (pytest --ignore=agent/tests/e2e_backtest --tb=short -q)
New tests added (if applicable)
Tested manually (describe below)

Checklist

No changes to protected areas (src/agent/, src/session/, src/providers/) without prior discussion
No hardcoded values (API keys, file paths, magic numbers)
Code follows CONTRIBUTING.md guidelines
Documentation updated (if user-facing change)

warren618

Thanks for tackling swarm report persistence. Persisting post-mortem messages and moving the durable worker report into an artifact are both useful directions.

I think this needs changes before merge because the current implementation can regress the final swarm report shown by the API/UI, and the expanded event logging may persist or stream large/sensitive tool payloads.

Blocking: the prompt now requires the worker to write the full report to report.md and then return only a brief 2-3 sentence response. But the runtime still uses response.content as WorkerResult.summary, and SwarmRuntime later uses that summary as run.final_report.

That means the final report exposed by the dashboard/API can become only the short closing response, while the real report is hidden in artifacts. Please read report.md after the final write_file and use that content as the worker summary/final report, falling back to response text only if the file is missing.

This now persists and streams raw tool arguments plus up to 5000 chars of tool result content. That can include write_file content, file/document snippets, command output, or other sensitive/large payloads in events.jsonl and SSE.

Please keep this to a short preview, similar to the main AgentLoop, and explicitly redact or omit large/sensitive fields such as content, env, headers, and full file contents.

The PR changes behavior but does not add a regression test for the report handoff. A focused test where a worker writes report.md and then returns a short response would pin the intended summary/final_report behavior.

Swarm workers frequently produced no usable output because: - The Execution Rules prompt asked agents to write summaries "directly in your response," but the worker only captured `last_assistant_content` (often a mid-process message like "let me fix the script"). - On timeout/iteration limit, all in-memory messages were discarded. - Event logging recorded only tool names and elapsed time, not I/O. Changes to worker.py: - Force agents to write final reports via `write_file report.md` (Phase 3), decoupling report persistence from the in-memory message lifecycle. - Add `_best_summary()` to extract the longest assistant message (>100 chars) instead of relying on the last fragment. - Add `_persist_messages()` to save full conversation to messages.json on timeout/iteration-limit for post-mortem debugging. - Include tool arguments in tool_call events and result content (up to 5000 chars) in tool_result events for better observability. Changes to equity_research_team.yaml: - Increase timeout_seconds from 600 to 900 for all four agents, as the aggregator previously timed out at 615s. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add `agent_root / ".swarm" / "runs"` to `_default_run_roots()` in path_utils.py. Without this, all `write_file` / `edit_file` / `read_file` calls from swarm workers fail with "run_dir is outside allowed run roots". - Increase `timeout_seconds` from 600 to 1800 for all 29 swarm presets. The aggregator agent in commodity_research_team timed out at 615s, and complex analysis tasks routinely exceed 600s. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Address PR review feedback (HKUDS#84): 1. Read report.md back into WorkerResult.summary: - Add `_resolve_summary()` that checks for report.md in artifact dir - Applied at all 5 exit points (timeout, token_limit, llm_failure, normal_completion, iteration_limit) - Ensures API/UI shows the full report, not just the brief closing text 2. Sanitize event logging: - Truncate tool arguments to 200 chars per value - Rename "result" to "result_preview" and reduce from 5000 to 500 chars - Avoids persisting/streaming large or sensitive tool payloads 3. Add regression tests for report → summary handoff: - Test report.md content is returned when present - Test fallback behavior when report.md missing or empty - Test error handling for unreadable paths Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

warren618

Maintainer follow-up pushed to the PR branch. Verified locally: swarm report regression tests, preset packaging/inspect tests, py_compile, and diff check pass. Thanks @iampengqian for the fix.

Address PR review feedback (HKUDS#84): 1. Read report.md back into WorkerResult.summary: - Add `_resolve_summary()` that checks for report.md in artifact dir - Applied at all 5 exit points (timeout, token_limit, llm_failure, normal_completion, iteration_limit) - Ensures API/UI shows the full report, not just the brief closing text 2. Sanitize event logging: - Truncate tool arguments to 200 chars per value - Rename "result" to "result_preview" and reduce from 5000 to 500 chars - Avoids persisting/streaming large or sensitive tool payloads 3. Add regression tests for report → summary handoff: - Test report.md content is returned when present - Test fallback behavior when report.md missing or empty - Test error handling for unreadable paths Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

warren618 requested changes May 9, 2026

View reviewed changes

iampengqian and others added 3 commits May 11, 2026 19:15

iampengqian force-pushed the fix/swarm-worker-report-persistence branch from 931162d to a2dba07 Compare May 11, 2026 11:15

fix(swarm): redact worker event previews

8d26f56

warren618 approved these changes May 12, 2026

View reviewed changes

warren618 merged commit e5d3610 into HKUDS:main May 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: persist swarm worker reports and improve summary extraction#84

fix: persist swarm worker reports and improve summary extraction#84
warren618 merged 4 commits into
HKUDS:mainfrom
iampengqian:fix/swarm-worker-report-persistence

iampengqian commented May 9, 2026

warren618 left a comment

warren618 left a comment

Labels

2 participants

Uh oh!

Conversation

iampengqian commented May 9, 2026

Summary

Why

Changes

Test Plan

Checklist

warren618 left a comment

Choose a reason for hiding this comment

warren618 left a comment

Choose a reason for hiding this comment

Labels

2 participants