Skip to content

fix(mcp_server): pre-warm registry in main thread to avoid tools/call deadlock#85

Merged
warren618 merged 1 commit into
HKUDS:mainfrom
Teerapat-Vatpitak:fix/mcp-server-tools-call-deadlock
May 9, 2026
Merged

fix(mcp_server): pre-warm registry in main thread to avoid tools/call deadlock#85
warren618 merged 1 commit into
HKUDS:mainfrom
Teerapat-Vatpitak:fix/mcp-server-tools-call-deadlock

Conversation

@Teerapat-Vatpitak

Copy link
Copy Markdown
Contributor

Summary

  • Fixes a deadlock where every tools/call against vibe-trading-mcp hangs indefinitely while initialize and tools/list succeed.
  • Pre-warms the tool registry in the main thread before mcp.run() so the first tools/call doesn't trigger heavy lazy imports inside FastMCP's asyncio worker thread.
  • Affects all MCP clients (Claude Code, Cursor, etc.) — not specific to any one client.

Why

agent/mcp_server.py:68_get_registry() lazy-imports src.tools.build_registry and constructs the registry on first call. All 14 call sites in this file are FastMCP tool handlers, so the first tools/call triggers the lazy init inside FastMCP's asyncio worker thread. The import chain — specifically loading src.tools.shell.* — deadlocks in that thread context.

initialize and tools/list succeed because they don't touch the registry, which makes the bug confusing to diagnose: the server appears healthy until the first tool call, then hangs forever with no error.

Fix

One line in main(), right after _registry = None:

_include_shell_tools = True if args.transport == "stdio" else _env_shell_tools_enabled()
_registry = None
_get_registry()  # pre-warm: avoids deadlock when first tools/call lazy-inits inside FastMCP worker thread

_include_shell_tools is set on the line above, so the pre-warm builds the registry with the correct value for the chosen transport. _get_registry() is idempotent (if _registry is None guard at line 70), so the 14 existing call sites still hit the cached instance — unaffected.

Differential test (deterministic before/after)

Verified on Python 3.13, Windows 11, fastmcp 3.2.4 with a raw stdio MCP client:

Run Patch state initialize tools/call analyze_options
1 applied 8.63s ok 0.01s ok
2 reverted 3.23s ok 15s timeout — HANG
3 re-applied 6.94s ok 0.01s ok

Toggling the single line on/off flips behavior every time — root cause is conclusive.

Side effects

  • No functional change to any tool's behavior or output.
  • Errors during registry build now surface at startup (fail-fast) instead of at first tool call.
  • Slight startup-time shift: cost previously paid on first tools/call is now paid during initialize. Net total unchanged.
  • For SSE transport: HTTP server begins listening ~5–8s later. Flagging in case it matters for any healthcheck-sensitive deployment.

Test plan

  • Differential test (table above) — deterministic before/after on raw stdio
  • End-to-end via Claude Code MCP client — tools/call get_market_data and tools/call analyze_options both return in <2s
  • Both stdio and SSE entry paths pick up the fix (pre-warm runs before either mcp.run(...) branch)

Checklist

  • No changes to protected areas (agent/src/agent/, agent/src/session/, agent/src/providers/) — only agent/mcp_server.py.
  • No hardcoded values.
  • Code follows CONTRIBUTING.md (Conventional Commit title, no comment-preserved code).
  • Documentation updated — N/A (no user-facing behavior change beyond bug fix).
… deadlock

_get_registry() lazy-imports src.tools.build_registry on first call. Every
caller is a FastMCP tool handler, so the first tools/call triggers the lazy
init inside FastMCP's asyncio worker thread, where the import chain (notably
src.tools.shell.*) deadlocks. initialize and tools/list succeed because
they don't touch the registry.

Pre-warming _get_registry() in main() — after _include_shell_tools is set
and before mcp.run() — performs the import on the main thread and lets
worker threads only ever read the cached singleton. _get_registry() is
idempotent (if _registry is None guard), so the 14 existing call sites are
unaffected.

Verified deterministically: removing the line reintroduces the hang within
15s; adding it back fixes tools/call within 0.01s. End-to-end through
Claude Code's MCP client returns in <2s.
@warren618 warren618 merged commit 6809324 into HKUDS:main May 9, 2026
ykykj pushed a commit that referenced this pull request May 10, 2026
)

Spawns vibe-trading-mcp via subprocess and walks through the JSON-RPC
happy path: initialize + tools/list + tools/call analyze_options. The
critical assertion is that tools/call does not hang — without the
registry pre-warm in mcp_server.main() (PR #85), the lazy registry build
runs inside FastMCP's worker thread and deadlocks on the first call.

The test also verifies (a) the tool catalogue has at least 20 entries
including analyze_options/get_market_data/list_skills, and (b) the
analyze_options result has a positive price and a delta in [0, 1] —
covering tool-registration regressions and Black-Scholes formula breaks
in addition to the deadlock.

analyze_options is chosen over network-dependent tools (get_market_data,
web_search) so the test stays deterministic and runs offline in CI.

Verified locally with differential validation:
- with pre-warm: PASS in 3.91s
- pre-warm commented out: FAIL with 15s timeout on tools/call
- pre-warm restored: PASS in 3.81s
@Teerapat-Vatpitak Teerapat-Vatpitak deleted the fix/mcp-server-tools-call-deadlock branch May 13, 2026 04:00
12zuev added a commit to 12zuev/Vibe-Trading that referenced this pull request May 25, 2026
…DS#85)

PR HKUDS#145 squash-merged to cryptobur-landing main. New MCP tool returns
the full architecture + live state in one call so future Claude/Codex
sessions don't have to re-discover the codebase from scratch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants