Skip to content

fix: persist swarm worker reports and improve summary extraction#84

Merged
warren618 merged 4 commits into
HKUDS:mainfrom
iampengqian:fix/swarm-worker-report-persistence
May 12, 2026
Merged

fix: persist swarm worker reports and improve summary extraction#84
warren618 merged 4 commits into
HKUDS:mainfrom
iampengqian:fix/swarm-worker-report-persistence

Conversation

@iampengqian

Copy link
Copy Markdown
Contributor

Swarm workers frequently produced no usable output because:

  • The Execution Rules prompt asked agents to write summaries "directly in your response," but the worker only captured last_assistant_content (often a mid-process message like "let me fix the script").
  • On timeout/iteration limit, all in-memory messages were discarded.
  • Event logging recorded only tool names and elapsed time, not I/O.

Changes to worker.py:

  • Force agents to write final reports via write_file report.md (Phase 3), decoupling report persistence from the in-memory message lifecycle.
  • Add _best_summary() to extract the longest assistant message (>100 chars) instead of relying on the last fragment.
  • Add _persist_messages() to save full conversation to messages.json on timeout/iteration-limit for post-mortem debugging.
  • Include tool arguments in tool_call events and result content (up to 5000 chars) in tool_result events for better observability.

Changes to equity_research_team.yaml:

  • Increase timeout_seconds from 600 to 900 for all four agents, as the aggregator previously timed out at 615s.

Summary

Why

Changes

Test Plan

  • Existing tests pass (pytest --ignore=agent/tests/e2e_backtest --tb=short -q)
  • New tests added (if applicable)
  • Tested manually (describe below)

Checklist

  • No changes to protected areas (src/agent/, src/session/, src/providers/) without prior discussion
  • No hardcoded values (API keys, file paths, magic numbers)
  • Code follows CONTRIBUTING.md guidelines
  • Documentation updated (if user-facing change)

@warren618 warren618 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for tackling swarm report persistence. Persisting post-mortem messages and moving the durable worker report into an artifact are both useful directions.

I think this needs changes before merge because the current implementation can regress the final swarm report shown by the API/UI, and the expanded event logging may persist or stream large/sensitive tool payloads.

Blocking: the prompt now requires the worker to write the full report to report.md and then return only a brief 2-3 sentence response. But the runtime still uses response.content as WorkerResult.summary, and SwarmRuntime later uses that summary as run.final_report.

That means the final report exposed by the dashboard/API can become only the short closing response, while the real report is hidden in artifacts. Please read report.md after the final write_file and use that content as the worker summary/final report, falling back to response text only if the file is missing.

This now persists and streams raw tool arguments plus up to 5000 chars of tool result content. That can include write_file content, file/document snippets, command output, or other sensitive/large payloads in events.jsonl and SSE.

Please keep this to a short preview, similar to the main AgentLoop, and explicitly redact or omit large/sensitive fields such as content, env, headers, and full file contents.

The PR changes behavior but does not add a regression test for the report handoff. A focused test where a worker writes report.md and then returns a short response would pin the intended summary/final_report behavior.

iampengqian and others added 3 commits May 11, 2026 19:15
Swarm workers frequently produced no usable output because:
- The Execution Rules prompt asked agents to write summaries "directly in
  your response," but the worker only captured `last_assistant_content`
  (often a mid-process message like "let me fix the script").
- On timeout/iteration limit, all in-memory messages were discarded.
- Event logging recorded only tool names and elapsed time, not I/O.

Changes to worker.py:
- Force agents to write final reports via `write_file report.md` (Phase 3),
  decoupling report persistence from the in-memory message lifecycle.
- Add `_best_summary()` to extract the longest assistant message (>100
  chars) instead of relying on the last fragment.
- Add `_persist_messages()` to save full conversation to messages.json on
  timeout/iteration-limit for post-mortem debugging.
- Include tool arguments in tool_call events and result content (up to 5000
  chars) in tool_result events for better observability.

Changes to equity_research_team.yaml:
- Increase timeout_seconds from 600 to 900 for all four agents, as the
  aggregator previously timed out at 615s.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add `agent_root / ".swarm" / "runs"` to `_default_run_roots()` in
  path_utils.py. Without this, all `write_file` / `edit_file` / `read_file`
  calls from swarm workers fail with "run_dir is outside allowed run roots".
- Increase `timeout_seconds` from 600 to 1800 for all 29 swarm presets.
  The aggregator agent in commodity_research_team timed out at 615s, and
  complex analysis tasks routinely exceed 600s.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address PR review feedback (HKUDS#84):

1. Read report.md back into WorkerResult.summary:
   - Add `_resolve_summary()` that checks for report.md in artifact dir
   - Applied at all 5 exit points (timeout, token_limit, llm_failure,
     normal_completion, iteration_limit)
   - Ensures API/UI shows the full report, not just the brief closing text

2. Sanitize event logging:
   - Truncate tool arguments to 200 chars per value
   - Rename "result" to "result_preview" and reduce from 5000 to 500 chars
   - Avoids persisting/streaming large or sensitive tool payloads

3. Add regression tests for report → summary handoff:
   - Test report.md content is returned when present
   - Test fallback behavior when report.md missing or empty
   - Test error handling for unreadable paths

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@iampengqian iampengqian force-pushed the fix/swarm-worker-report-persistence branch from 931162d to a2dba07 Compare May 11, 2026 11:15

@warren618 warren618 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maintainer follow-up pushed to the PR branch. Verified locally: swarm report regression tests, preset packaging/inspect tests, py_compile, and diff check pass. Thanks @iampengqian for the fix.

@warren618 warren618 merged commit e5d3610 into HKUDS:main May 12, 2026
ykykj pushed a commit to ykykj/Vibe-Trading that referenced this pull request May 23, 2026
Address PR review feedback (HKUDS#84):

1. Read report.md back into WorkerResult.summary:
   - Add `_resolve_summary()` that checks for report.md in artifact dir
   - Applied at all 5 exit points (timeout, token_limit, llm_failure,
     normal_completion, iteration_limit)
   - Ensures API/UI shows the full report, not just the brief closing text

2. Sanitize event logging:
   - Truncate tool arguments to 200 chars per value
   - Rename "result" to "result_preview" and reduce from 5000 to 500 chars
   - Avoids persisting/streaming large or sensitive tool payloads

3. Add regression tests for report → summary handoff:
   - Test report.md content is returned when present
   - Test fallback behavior when report.md missing or empty
   - Test error handling for unreadable paths

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants