fix(mcp_server): pre-warm registry in main thread to avoid tools/call deadlock#85
Merged
warren618 merged 1 commit intoMay 9, 2026
Conversation
… deadlock _get_registry() lazy-imports src.tools.build_registry on first call. Every caller is a FastMCP tool handler, so the first tools/call triggers the lazy init inside FastMCP's asyncio worker thread, where the import chain (notably src.tools.shell.*) deadlocks. initialize and tools/list succeed because they don't touch the registry. Pre-warming _get_registry() in main() — after _include_shell_tools is set and before mcp.run() — performs the import on the main thread and lets worker threads only ever read the cached singleton. _get_registry() is idempotent (if _registry is None guard), so the 14 existing call sites are unaffected. Verified deterministically: removing the line reintroduces the hang within 15s; adding it back fixes tools/call within 0.01s. End-to-end through Claude Code's MCP client returns in <2s.
8 tasks
ykykj
pushed a commit
that referenced
this pull request
May 10, 2026
) Spawns vibe-trading-mcp via subprocess and walks through the JSON-RPC happy path: initialize + tools/list + tools/call analyze_options. The critical assertion is that tools/call does not hang — without the registry pre-warm in mcp_server.main() (PR #85), the lazy registry build runs inside FastMCP's worker thread and deadlocks on the first call. The test also verifies (a) the tool catalogue has at least 20 entries including analyze_options/get_market_data/list_skills, and (b) the analyze_options result has a positive price and a delta in [0, 1] — covering tool-registration regressions and Black-Scholes formula breaks in addition to the deadlock. analyze_options is chosen over network-dependent tools (get_market_data, web_search) so the test stays deterministic and runs offline in CI. Verified locally with differential validation: - with pre-warm: PASS in 3.91s - pre-warm commented out: FAIL with 15s timeout on tools/call - pre-warm restored: PASS in 3.81s
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
tools/callagainstvibe-trading-mcphangs indefinitely whileinitializeandtools/listsucceed.mcp.run()so the firsttools/calldoesn't trigger heavy lazy imports inside FastMCP's asyncio worker thread.Why
agent/mcp_server.py:68—_get_registry()lazy-importssrc.tools.build_registryand constructs the registry on first call. All 14 call sites in this file are FastMCP tool handlers, so the firsttools/calltriggers the lazy init inside FastMCP's asyncio worker thread. The import chain — specifically loadingsrc.tools.shell.*— deadlocks in that thread context.initializeandtools/listsucceed because they don't touch the registry, which makes the bug confusing to diagnose: the server appears healthy until the first tool call, then hangs forever with no error.Fix
One line in
main(), right after_registry = None:_include_shell_toolsis set on the line above, so the pre-warm builds the registry with the correct value for the chosen transport._get_registry()is idempotent (if _registry is Noneguard at line 70), so the 14 existing call sites still hit the cached instance — unaffected.Differential test (deterministic before/after)
Verified on Python 3.13, Windows 11, fastmcp 3.2.4 with a raw stdio MCP client:
Toggling the single line on/off flips behavior every time — root cause is conclusive.
Side effects
tools/callis now paid duringinitialize. Net total unchanged.Test plan
tools/call get_market_dataandtools/call analyze_optionsboth return in <2smcp.run(...)branch)Checklist
agent/src/agent/,agent/src/session/,agent/src/providers/) — onlyagent/mcp_server.py.CONTRIBUTING.md(Conventional Commit title, no comment-preserved code).