fix: persist swarm worker reports and improve summary extraction#84
Conversation
warren618
left a comment
There was a problem hiding this comment.
Thanks for tackling swarm report persistence. Persisting post-mortem messages and moving the durable worker report into an artifact are both useful directions.
I think this needs changes before merge because the current implementation can regress the final swarm report shown by the API/UI, and the expanded event logging may persist or stream large/sensitive tool payloads.
Blocking: the prompt now requires the worker to write the full report to report.md and then return only a brief 2-3 sentence response. But the runtime still uses response.content as WorkerResult.summary, and SwarmRuntime later uses that summary as run.final_report.
That means the final report exposed by the dashboard/API can become only the short closing response, while the real report is hidden in artifacts. Please read report.md after the final write_file and use that content as the worker summary/final report, falling back to response text only if the file is missing.
This now persists and streams raw tool arguments plus up to 5000 chars of tool result content. That can include write_file content, file/document snippets, command output, or other sensitive/large payloads in events.jsonl and SSE.
Please keep this to a short preview, similar to the main AgentLoop, and explicitly redact or omit large/sensitive fields such as content, env, headers, and full file contents.
The PR changes behavior but does not add a regression test for the report handoff. A focused test where a worker writes report.md and then returns a short response would pin the intended summary/final_report behavior.
Swarm workers frequently produced no usable output because: - The Execution Rules prompt asked agents to write summaries "directly in your response," but the worker only captured `last_assistant_content` (often a mid-process message like "let me fix the script"). - On timeout/iteration limit, all in-memory messages were discarded. - Event logging recorded only tool names and elapsed time, not I/O. Changes to worker.py: - Force agents to write final reports via `write_file report.md` (Phase 3), decoupling report persistence from the in-memory message lifecycle. - Add `_best_summary()` to extract the longest assistant message (>100 chars) instead of relying on the last fragment. - Add `_persist_messages()` to save full conversation to messages.json on timeout/iteration-limit for post-mortem debugging. - Include tool arguments in tool_call events and result content (up to 5000 chars) in tool_result events for better observability. Changes to equity_research_team.yaml: - Increase timeout_seconds from 600 to 900 for all four agents, as the aggregator previously timed out at 615s. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add `agent_root / ".swarm" / "runs"` to `_default_run_roots()` in path_utils.py. Without this, all `write_file` / `edit_file` / `read_file` calls from swarm workers fail with "run_dir is outside allowed run roots". - Increase `timeout_seconds` from 600 to 1800 for all 29 swarm presets. The aggregator agent in commodity_research_team timed out at 615s, and complex analysis tasks routinely exceed 600s. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address PR review feedback (HKUDS#84): 1. Read report.md back into WorkerResult.summary: - Add `_resolve_summary()` that checks for report.md in artifact dir - Applied at all 5 exit points (timeout, token_limit, llm_failure, normal_completion, iteration_limit) - Ensures API/UI shows the full report, not just the brief closing text 2. Sanitize event logging: - Truncate tool arguments to 200 chars per value - Rename "result" to "result_preview" and reduce from 5000 to 500 chars - Avoids persisting/streaming large or sensitive tool payloads 3. Add regression tests for report → summary handoff: - Test report.md content is returned when present - Test fallback behavior when report.md missing or empty - Test error handling for unreadable paths Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
931162d to
a2dba07
Compare
warren618
left a comment
There was a problem hiding this comment.
Maintainer follow-up pushed to the PR branch. Verified locally: swarm report regression tests, preset packaging/inspect tests, py_compile, and diff check pass. Thanks @iampengqian for the fix.
Address PR review feedback (HKUDS#84): 1. Read report.md back into WorkerResult.summary: - Add `_resolve_summary()` that checks for report.md in artifact dir - Applied at all 5 exit points (timeout, token_limit, llm_failure, normal_completion, iteration_limit) - Ensures API/UI shows the full report, not just the brief closing text 2. Sanitize event logging: - Truncate tool arguments to 200 chars per value - Rename "result" to "result_preview" and reduce from 5000 to 500 chars - Avoids persisting/streaming large or sensitive tool payloads 3. Add regression tests for report → summary handoff: - Test report.md content is returned when present - Test fallback behavior when report.md missing or empty - Test error handling for unreadable paths Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Swarm workers frequently produced no usable output because:
last_assistant_content(often a mid-process message like "let me fix the script").Changes to worker.py:
write_file report.md(Phase 3), decoupling report persistence from the in-memory message lifecycle._best_summary()to extract the longest assistant message (>100 chars) instead of relying on the last fragment._persist_messages()to save full conversation to messages.json on timeout/iteration-limit for post-mortem debugging.Changes to equity_research_team.yaml:
Summary
Why
Changes
Test Plan
pytest --ignore=agent/tests/e2e_backtest --tb=short -q)Checklist
src/agent/,src/session/,src/providers/) without prior discussion