Skip to content

kisaragi-mochi/stackchan-mcp

Repository files navigation

English | 日本語

stackchan-mcp

An MCP (Model Context Protocol) bridge for the M5Stack official StackChan (2025 Kickstarter shipping kit), letting any LLM client drive the device.

Born out of the stack-chan project community (originated by Shinya Ishikawa in 2021). This repository targets the M5Stack official StackChan kit that grew out of that lineage.

┌─────────────┐     stdio MCP      ┌──────────────┐    WebSocket MCP    ┌──────────────┐
│ MCP client  │ ─────────────────▶ │   gateway    │ ──────────────────▶ │ ESP32 (CoreS3│
│ (e.g.Claude)│ ◀───────────────── │  (Python)    │ ◀────────────────── │  +StackChan) │
└─────────────┘                    │              │                     └──────────────┘
                                   │  /capture    │ ◀── HTTP POST (JPEG) ──┘
                                   └──────────────┘

From any MCP client (Claude Code / Claude Desktop / others) you can call StackChan operations such as head movement, camera capture, touch sensor reads, and avatar expression switches.

Repository layout

This repository is a monorepo.

Directory Contents
firmware/ Full git subtree of 78/xiaozhi-esp32. The custom StackChan board lives at firmware/main/boards/stackchan/.
gateway/ Python MCP gateway. stdio MCP server (LLM side) + WebSocket MCP client (ESP32 side) + HTTP capture server.
docs/ architecture.md: full component diagram, tool name mapping, photo flow, auth, phase roadmap. firmware-sync.md: upstream xiaozhi-esp32 sync playbook. remote-access.md: Tailscale Funnel setup for non-LAN use.
examples/ Optional, unmaintained examples. cloudflare-relay/: Cloudflare Workers WebSocket relay for reaching the gateway from outside the local LAN.

Target hardware

M5Stack official StackChan kit (Kickstarter 2025 shipping version). The firmware in this repository is meant to replace the kit's factory firmware.

Part Spec
Body M5Stack CoreS3 (ESP32-S3, 16MB Flash, 8MB PSRAM)
Neck servos SCS0009 ×2 (yaw + pitch, serial bus, TX=GPIO6, RX=GPIO7)
Camera GC0308 (DVP, 320×240)
Touch FT6336 / Si12T
Display ILI9342 (SPI, 320×240)

A self-built stack-chan (following the original stack-chan project design) may also work as long as the pin assignments and I2C addresses match. Reports and PRs welcome.

Tools (callable by MCP clients via the gateway)

Tool Description Status
get_status Gateway connection state
get_device_info ESP32 device state (battery / volume / WiFi / etc.)
take_photo(question?) Capture a frame, save as JPEG, return the path
set_volume(volume) Speaker volume (0-100)
set_brightness(brightness) Screen brightness (0-100)
move_head(yaw, pitch, speed?) Move the neck (servos). pitch is constrained to 5..85 — the M5Stack-recommended operating range. For the wider firmware hard clamp (0..88), use the firmware-side set_head_angles device tool instead.
get_touch_state Touch sensor state (press / release / stroke / etc.)
get_touch_sensor_enabled Read whether head-touch detection is enabled. The NVS-backed setting persists across reboot.
set_touch_sensor_enabled(enabled) Enable or disable head-touch detection. Disabling stops both the firmware local motion response and MCP stackchan/event emission, and persists across reboot.
set_avatar(face) Switch avatar expression (idle / happy / thinking / sad / surprised / embarrassed), or off to hide the avatar and disable blink so the underlying WiFi config / OTA / settings screens are visible. Any other face brings the avatar back and restores blink.
set_blink(state) Blink on/off
set_mouth(state) Mouth open/close (one-shot, held until next call)
set_mouth_sequence(steps) Queue and play a list of {shape, duration_ms} steps locally for TTS lip-sync — no per-step WebSocket RTT jitter
check_vm_en Check servo power supply (VM EN HIGH) state
set_led(index, r, g, b) Set one of the 12 base RGB LEDs (index 0..11, channels 0..255)
set_all_leds(r, g, b) Set all 12 base RGB LEDs to the same color
set_leds(colors) Batch-set the first N LEDs from a [[r,g,b], ...] array in a single I2C burst (use this for animations / multi-color patterns); trailing LEDs keep their previous color
clear_leds Turn all 12 base RGB LEDs off
say(text, voice?, speaker_id?, reference_audio?) Speak text on the device speaker via gateway-side TTS. A supported expression emoji in the text can switch the avatar face in the same call. Default engine: VOICEVOX (runs as a separate HTTP service — see TTS setup). Requires the [tts] extra.
listen(duration_ms?, engine?, language?, model?, motion?, look_up_pitch?) Capture a short utterance from the device microphone and transcribe it via gateway-side STT. Default engine: faster-whisper (local, MIT) — see STT setup. Optional motion feedback can show the thinking face or tilt the head up during capture. Requires the [stt-faster-whisper] (or [stt-openai]) extra and a firmware update with the inbound listen wire type.
stackchan_follow_pose_stream(action, url, ...) Subscribe to an arbitrary external WebSocket pose-stream and drive the head to follow incoming yaw / pitch frames 1:1 within the SCS0009 working range. action switches between start / stop / status. Includes per-axis flip, pitch-center offset, downsample rate cap, angular-velocity clamp, and reconnect with exponential backoff; the initial pose is seeded from the device so the angular-velocity clamp is anchored at the real servo position from the first frame. The upstream server's protocol (zero-offset commands, source dispatching, transport) is intentionally outside this gateway's scope.

See gateway/README.md for full schemas.

Quick start

1. Flash the firmware (CoreS3)

There are two paths. Option A is recommended for first-time users — no toolchain setup needed. Option B is for contributors who want to build from source.

Option A: Flash a pre-built binary (recommended for end users)

Download the latest firmware bundle from the Releases page — pick the most recent firmware-v* release and grab merged-binary.bin (and optionally xiaozhi.bin). Then flash with esptool.py:

# Replace --port with your platform's serial device:
#   macOS:   /dev/cu.usbmodem* (e.g. /dev/cu.usbmodem1101)
#   Linux:   /dev/ttyUSB0 or /dev/ttyACM0
#   Windows: COM3 (or whichever it shows up as in Device Manager)

# Clean install (resets NVS — Wi-Fi settings will need to be re-entered):
esptool.py --chip esp32s3 --port /dev/cu.usbmodem1101 -b 460800 \
  write_flash 0x0 merged-binary.bin

# Or, app-only update (preserves NVS — keeps your Wi-Fi setup):
esptool.py --chip esp32s3 --port /dev/cu.usbmodem1101 -b 460800 \
  write_flash 0x20000 xiaozhi.bin

No ESP-IDF or Docker setup needed.

Option B: Build from source with Docker (for contributors)

This repository uses git submodules under firmware/components/. If you cloned without --recursive, initialize them first:

git submodule update --init --recursive

Then build:

cd firmware
docker run --rm --cpus=4 --ulimit nofile=65536:65536 \
  -v $PWD:/project -w /project espressif/idf:v5.5.2 \
  python ./scripts/release.py stackchan
# → releases/v2.2.6_stackchan.zip

# Flash (after USB-connecting the CoreS3).
# Replace --port with your platform's serial device — see the Option A
# note above for the macOS/Linux/Windows mapping.
esptool.py --chip esp32s3 --port /dev/cu.usbmodem1101 -b 460800 \
  write_flash 0x0 build/merged-binary.bin

The --cpus=4 flag caps Docker container parallelism so the concurrent LVGL / xiaozhi-fonts/emoji_*.c compile steps stay within the memory budget on macOS Docker hosts (OrbStack / Docker Desktop). Without it, ninja autodetects job count from /proc/cpuinfo and the resulting parallel gcc pressure can exhaust container memory mid-LVGL with Cannot allocate memory — even on hosts with ample physical RAM (tracked as #112). The --ulimit nofile=65536:65536 flag separately avoids a Too many open files failure during the same LVGL compile step under the default file-descriptor limit. Linux hosts with higher defaults are unaffected, but passing both flags unconditionally is safe and matches CI.

After flashing, WiFi configuration happens on first boot — connect from a smartphone to the setup UI (the xiaozhi-esp32 standard flow).

On a local network, the gateway advertises _stackchan-mcp._tcp.local. by default. Fresh firmware can use that mDNS/DNS-SD record to find the WebSocket endpoint automatically when no primary URL has been saved yet. mDNS discovery requires NVS websocket.url to be empty.

Configuring the WebSocket gateway URL and auth token

Primary URL resolution order:

  1. NVS websocket.url
  2. mDNS _stackchan-mcp._tcp.local. when CONFIG_STACKCHAN_MDNS_DISCOVERY is enabled and the primary NVS URL is empty
  3. CONFIG_DEFAULT_WEBSOCKET_URL
  4. Empty/fail with a boot log error

Existing websocket.fallback_url and CONFIG_DEFAULT_WEBSOCKET_FALLBACK_URL candidates are still tried after the primary candidate path above. CONFIG_FORCE_DEFAULT_WEBSOCKET_URL=y remains an explicit exception for stale-NVS recovery: a non-empty Kconfig URL wins and mDNS discovery is skipped.

The gateway advertises mDNS by default; run stackchan-mcp --no-mdns to disable advertisement. To compile out firmware discovery, set CONFIG_STACKCHAN_MDNS_DISCOVERY=n. Discovery requires UDP multicast on the local LAN, and some routers or VLANs block it. When multiple gateways are visible, the firmware tries each usable IPv4 address from every supported gateway service discovered in one browse, and logs the accepted instance count and candidate address list. mDNS only discovers the URL; websocket.token / CONFIG_DEFAULT_WEBSOCKET_TOKEN still control authentication.

For LAN auto-discovery with an automatic out-of-LAN relay fallback, keep NVS websocket.url empty and store the relay URL in websocket.fallback_url. With that profile, mDNS candidates are tried first, and the relay is tried later if the local discovery path does not complete the WebSocket server hello flow.

Automatic recovery after a gateway host IPv4 change requires both sides of the paired mDNS fix: Gateway vA.B.C+ refreshes the advertised service when the host address changes, and Firmware vX.Y.Z+ tries all supported mDNS service instances from a browse. Earlier firmware may keep trying a stale cached instance until reboot.

The firmware reads these NVS keys for the gateway connection:

  • websocket.url — the gateway WebSocket URL (e.g. ws://<gateway-host>:8765/)
  • websocket.fallback_url — optional second gateway URL to try when websocket.url cannot be reached or does not complete the server hello flow
  • websocket.token — the bearer token sent as Authorization: Bearer <token>, matched against STACKCHAN_TOKEN / BEARER_TOKEN on the gateway side (leave both empty to skip authentication entirely)

There are four practical ways to provide them, plus one temporary source-level escape hatch:

  1. Build-time defaults via Kconfig (recommended for developers): run idf.py menuconfigComponent configXiaozhi Assistant, and set:

    • Default WebSocket gateway URL (fallback when NVS is empty)CONFIG_DEFAULT_WEBSOCKET_URL (e.g. ws://<gateway-host>:8765/)
    • Fallback WebSocket gateway URLCONFIG_DEFAULT_WEBSOCKET_FALLBACK_URL
    • Default WebSocket auth token (fallback when NVS is empty)CONFIG_DEFAULT_WEBSOCKET_TOKEN (leave empty if your gateway accepts unauthenticated connections)

    By default these only apply when the corresponding NVS key is empty. For first-time flashes onto a fresh device this is exactly what you want. If both a primary and fallback URL are configured, the firmware tries them in deterministic order and keeps the first candidate that completes the WebSocket server hello flow.

  2. Use the on-device WiFi config UI (recommended for fresh users): while the device is in WiFi configuration mode, open the captive portal at http://192.168.4.1, switch to the Advanced tab, and fill in:

    • WebSocket Gateway URL (e.g. ws://<gateway-host>:8765/) — the primary gateway candidate.
    • Fallback Gateway URL (e.g. wss://<relay-host>/) — optional second candidate, tried only after the primary candidate fails the server-hello flow.
    • Gateway Token — optional bearer token, sent as Authorization: Bearer <token> to both candidates when set. The current value is never displayed (the WiFi config AP is unauthenticated, so the GET endpoint reports only whether a token is configured). Leave the field blank to keep the existing token, type a new value to update it, or hit ❌ to fall back to the build-time CONFIG_DEFAULT_WEBSOCKET_TOKEN. On stock builds where no Kconfig default is set this disables auth; on builds that ship a default token, ❌ reverts to that default rather than truly clearing authentication. To switch the device to an unauthenticated gateway on a build that ships a default token, rebuild the firmware with the Kconfig default empty (or set a non-empty token on the gateway side that matches the build default).

    Submit to persist the values to the websocket NVS namespace (websocket.url / websocket.fallback_url / websocket.token); they are read on the next boot. This is the intended path for end users running a pre-built firmware. Clearing a URL field with the ❌ button and submitting again falls back to the matching CONFIG_DEFAULT_WEBSOCKET_* Kconfig value (or to "no fallback" when no Kconfig default is set).

  3. Use the runtime MCP tools over an established connection: gateway_config_get reports websocket.url, websocket.fallback_url, whether a token is set, force_mode, discovery_enabled, and the connected candidate URL without exposing the token value. gateway_config_set accepts optional url, fallback_url, and token strings. Passing an empty string clears that NVS key; for example url="" restores the mDNS discovery path on the next reconnect. Changes are persisted immediately but do not trigger a disconnect, reconnect, or reboot.

  4. Write websocket.url / websocket.fallback_url / websocket.token directly to NVS (advanced): for example with a custom NVS-write tool over serial. Same persistence semantics as the WiFi config UI and runtime MCP tools; primarily useful for batch provisioning.

  5. Temporary source hardcode (not recommended): editing websocket_protocol.cc can unblock local experiments, but keep it out of commits.

Common gateway URL setups:

Mode Primary URL Fallback URL
LAN auto-discovery with relay fallback empty (mDNS) wss://<relay-host>/
Fixed local only ws://<gateway-host>:8765/ empty
Relay only wss://<relay-host>/ empty
Fixed local with relay fallback ws://<gateway-host>:8765/ wss://<relay-host>/

Existing devices with stale NVS — CONFIG_FORCE_DEFAULT_WEBSOCKET_URL

If you are flashing onto a device that previously ran upstream xiaozhi-esp32 firmware, NVS will already contain websocket.url=wss://api.tenclass.net/... written by the upstream OTA-config path. In this case the empty-NVS fallback in option 1 above will not trigger, and the device will keep trying to talk to tenclass instead of your local gateway.

If the device can still establish a stackchan-mcp connection through any candidate, call gateway_config_set with url="" to clear the stale primary NVS URL and restore the mDNS discovery path on the next reconnect. If the stale URL prevents any stackchan-mcp connection from being established, use the force-override switch without erasing all of NVS (which would also drop WiFi credentials):

  • Force CONFIG_DEFAULT_WEBSOCKET_URL/TOKEN to override NVSCONFIG_FORCE_DEFAULT_WEBSOCKET_URL=y

When set, non-empty Kconfig URL/token values override whatever NVS holds. Empty Kconfig values still fall through to the NVS-based behaviour, so leaving the token Kconfig empty keeps any NVS-stored token in use. The boot log will show FORCE: overriding NVS websocket.url with Kconfig: NVS=... -> ... so you can verify the override fired. This switch is the recommended way to bring ex-xiaozhi hardware onto a local stackchan-mcp gateway, and to lock CI/dev images to a known gateway URL.

The switch is opt-in so end-user devices configured at runtime keep their NVS-priority semantics.

Developer-local overrides — sdkconfig.defaults.local

For local hardware testing, do not put personal gateway URLs or tokens in the tracked firmware/sdkconfig.defaults. Instead, create a gitignored local file:

cd firmware
cat > sdkconfig.defaults.local <<'EOF'
CONFIG_DEFAULT_WEBSOCKET_URL="ws://<your-lan-ip>:8765/"
CONFIG_DEFAULT_WEBSOCKET_FALLBACK_URL="wss://<relay-host>/"
CONFIG_DEFAULT_WEBSOCKET_TOKEN="<your-dev-token>"
CONFIG_FORCE_DEFAULT_WEBSOCKET_URL=y
EOF

Both python ./scripts/release.py <board> and plain idf.py build will read this file when it exists. The file is ignored by git, so personal settings cannot be added accidentally with git add -A.

2. Start the gateway

The gateway can either be installed as the published PyPI package (recommended for end users) or run from this repository as a checkout (recommended for contributors who want to follow main).

Option A: install as a tool (recommended for end users)

For an isolated install that does not collide with your system Python or other Python projects, use one of:

uv tool install stackchan-mcp
# or
pipx install stackchan-mcp

Then run the gateway:

stackchan-mcp

If you prefer a project-managed virtualenv, pip install stackchan-mcp inside an active venv works as well, and python -m stackchan_mcp inside that venv is equivalent to stackchan-mcp. Just avoid pip install against the system Python (PEP 668).

The STACKCHAN_TOKEN, VISION_HOST, and other settings documented in gateway/README.md can be supplied via environment variables, the active shell, or a .env file in the working directory.

Option B: from source via uv (contributors)

cd gateway
cp .env.example .env       # set STACKCHAN_TOKEN / VISION_HOST
uv sync
uv run python -m stackchan_mcp

If the gateway is restarted while the ESP32 is already connected, the firmware automatically retries the WebSocket connection while idle. The retry delay starts at 5 seconds and backs off up to 60 seconds; use get_status to confirm that the device has reappeared. The same retry path also fires for any post-handshake server-initiated close — gateway crashes, TLS-layer resets, and gateway configurations that tear the WebSocket session down after the handshake — so the device recovers automatically once the gateway accepts the next connection attempt.

For non-LAN setups, see docs/remote-access.md for the Tailscale Funnel flow and the VISION_URL capture callback setting.

3. Register as an MCP client (Claude Code example)

Add to ~/.claude.json.

If you installed via pip install stackchan-mcp:

{
  "mcpServers": {
    "stackchan-mcp": {
      "type": "stdio",
      "command": "stackchan-mcp",
      "env": {
        "STACKCHAN_TOKEN": "your-secret-token-here",
        "VISION_HOST": "your.host.lan.ip"
      }
    }
  }
}

If you installed from source via uv:

{
  "mcpServers": {
    "stackchan-mcp": {
      "type": "stdio",
      "command": "uv",
      "args": [
        "run", "--directory", "/path/to/stackchan-mcp/gateway",
        "python", "-m", "stackchan_mcp"
      ]
    }
  }
}

See gateway/README.md for details.

Gateway user-defaults TOML file

The Python gateway can optionally read user-local default argument values from a TOML file under the OS-standard user config directory:

  • Linux / XDG: ~/.config/stackchan-mcp/user-defaults.toml
  • macOS: ~/Library/Application Support/stackchan-mcp/user-defaults.toml
  • Windows: %APPDATA%\stackchan-mcp\user-defaults.toml

The exact path is resolved with platformdirs, and the gateway logs the resolved path at startup. The tracked user-defaults.toml.example file at the repository root is the template.

The file is an overlay on top of the MCP schema defaults. If the file is absent or empty, schema defaults are used. If a key is omitted from the file, that key keeps its schema default. Explicit MCP tool-call arguments always take precedence over the file, so the order is: explicit argument > user-defaults file > schema default.

This is a gateway-side Python setting. It is separate from the firmware-side NVS connection settings exposed by gateway_config_get / gateway_config_set (added in PR #293). Those tools read or update the device's saved WebSocket connection settings; this TOML file only changes default MCP argument values used by the Python gateway.

Minimal example:

[tool.stackchan_follow_pose_stream]
smoothing_window = 1
downsample_hz = 20
max_step_deg = 30

4. Optional: TTS setup (VOICEVOX)

To make the device speak, install the [tts] extra and run a VOICEVOX engine alongside the gateway. VOICEVOX runs as a separate HTTP service, so its LGPL-3.0 license stays scoped to that process — the MIT-licensed gateway only issues HTTP requests against it.

Run the engine (Docker)

docker run --rm -p '127.0.0.1:50021:50021' \
  voicevox/voicevox_engine:cpu-ubuntu20.04-latest

The engine listens on port 50021 by default. The CPU image is fine for short sentences; GPU images are also published upstream.

Install the TTS extra

pip install 'stackchan-mcp[tts]'
# or, equivalently:
pip install 'stackchan-mcp[tts-voicevox]'

This pulls in httpx (HTTP client) and opuslib (Opus encoder bindings). The encoder needs libopus on the system — brew install opus on macOS, sudo apt-get install libopus0 on Debian/Ubuntu.

Configure (optional)

Environment variable Default Notes
STACKCHAN_VOICEVOX_URL http://127.0.0.1:50021 VOICEVOX engine URL.
STACKCHAN_VOICEVOX_DEFAULT_SPEAKER 3 Default speaker ID (Zundamon normal). See the VOICEVOX speaker list for other options.

Try it

From an MCP client:

say(text="こんにちは、わたしはスタックチャンです")

The gateway POSTs to VOICEVOX, decodes the returned WAV, resamples to 16 kHz mono, encodes Opus frames (60 ms each), and pushes them as WebSocket binary frames to the device — which decodes and plays them through its speaker. No firmware changes are required: the existing audio decoder pipeline already accepts these frames. The TTS framework is engine-agnostic, so additional engines plug in behind the same say API — see the Irodori engine below.

say also accepts supported expression emoji in the text: happy (😊 😄 😀 😁 🙂 😆 🥰 😍 😋 🤗), sad (😢 😭 😞 😔 ☹️ 🙁 😿), surprised (😲 😮 😯 😱 🤯), embarrassed (😳 😅 🫣), and thinking (🤔 🧐 💭). The first mapped emoji changes the avatar face in the same MCP call before speech; unmapped emoji do not change the face. VOICEVOX and other engines without emoji-style support strip all emoji before synthesis. If stripping leaves empty text, the face change is attempted and speech is skipped.

Alternative engine: Irodori

Irodori is a second TTS engine that calls an external synthesis service returning MP3. It plugs into the same say pipeline as VOICEVOX (MP3 is decoded to 16 kHz mono PCM, then encoded to Opus), and it receives emoji embedded in the input text verbatim so they can act as voice-style cues.

There is no default endpoint — the synthesis backend is a hosted service, and shipping a hard-coded URL would point every install at someone else's deployment. You must self-host a compatible synthesis API (for example, duplicate the reference Hugging Face Space, or run your own service that honours the same request/response contract) and point the gateway at it.

Install the extra:

pip install 'stackchan-mcp[tts-irodori]'

This pulls in httpx, opuslib, and miniaudio (a small self-contained MP3/audio decoder with prebuilt wheels, so no extra system library is needed beyond the libopus that opuslib already requires).

Configure via environment variables (URL and key are read from the environment only — never commit them):

Environment variable Default Notes
STACKCHAN_IRODORI_URL (required) Synthesis endpoint URL of your self-hosted service. Unset → the engine still lists but say(voice="irodori") returns a clear error.
STACKCHAN_IRODORI_KEY (none) Optional API key, forwarded as a query parameter.
STACKCHAN_IRODORI_SPEAKER 3 Default speaker identifier.
STACKCHAN_IRODORI_STEPS 24 Default diffusion step count (higher = slower, higher quality).

Select Irodori per call:

say(text="やったね😊", voice="irodori")

Or make it the default engine for every say call that omits voice:

export STACKCHAN_TTS_ENGINE=irodori

VOICEVOX remains the default when STACKCHAN_TTS_ENGINE is unset, and an explicit voice argument always overrides the default.

5. Optional: STT setup (faster-whisper)

To let the device hear, install one of the [stt-*] extras and pair it with a firmware that supports the inbound listen wire type (present from this release onward). The gateway sends a listen.start notification, the firmware opens the microphone, and the inbound Opus frames are decoded and transcribed locally — no audio leaves your machine when you use the default faster-whisper engine.

Install the STT extra

For local transcription with faster-whisper (MIT-licensed, CTranslate2-based, runs on CPU):

pip install 'stackchan-mcp[stt-faster-whisper]'

For the OpenAI Whisper API (cloud, useful when local compute is constrained):

pip install 'stackchan-mcp[stt-openai]'
export OPENAI_API_KEY=sk-...

Both extras include the base [stt] extra which pulls in opuslib for decoding the inbound frames. The decoder needs the system libopus library — brew install opus on macOS, sudo apt-get install libopus0 on Debian/Ubuntu (this is the same prerequisite as the [tts] extra).

Configure (optional)

Environment variable Default Notes
STACKCHAN_FASTER_WHISPER_MODEL base Model identifier — tiny / base / small / medium / large-v3. Larger models are more accurate but slower and use more memory.
STACKCHAN_FASTER_WHISPER_DEVICE cpu cpu / cuda / auto.
STACKCHAN_FASTER_WHISPER_COMPUTE_TYPE int8 int8 / float16 / float32.
STACKCHAN_OPENAI_WHISPER_MODEL whisper-1 OpenAI Whisper model identifier (only whisper-1 is currently exposed by the API).

Try it

From an MCP client:

listen(duration_ms=5000, language="ja")

The gateway sends {"type":"listen","state":"start","mode":"manual"} to the device, buffers the Opus frames the device streams up for the capture window, then sends {"type":"listen","state":"stop"} and hands the buffered audio to the registered STT engine. The first call to the faster-whisper engine downloads the chosen model (~140 MB for base) into the Hugging Face cache; subsequent calls reuse it. For visible capture feedback, pass motion="face-only" to show the thinking avatar during capture and restore idle at the end, or motion="look-up" to preserve yaw, tilt pitch to look_up_pitch (default 50°, valid 5..85°), show thinking, and hold that pose on success. The STT framework is engine-agnostic — additional engines (Vosk, whisper.cpp, cloud providers) can be added without changing the listen API.

6. Optional: enable event notifications

Stack-chan physical events (today: touch tap / stroke; the structure allows additional subtypes to be appended later) can be delivered through three notification channels. All channels are disabled by default; opt in via ~/.config/stackchan-mcp/notify.yml. More than one channel can be enabled at the same time; each event is then delivered over every enabled channel. Pick whichever match the host you are integrating with:

  • channels — Claude Code plugin path. Notifications are injected into the running session as <channel ...> blocks via Claude Code's experimental Channels capability. Use this when Stack-chan is wired into Claude Code as an installed plugin and you want events to reach the conversation in-band.
  • jsonl — Out-of-process file integration. Each event is appended as a single JSON line to a file you configure. Use this when an external host (anything other than Claude Code, or your own pipeline) needs to ingest events asynchronously by tailing a file.
  • legacy_event — Pre-plugin MCP notification. The gateway emits the self-defined stackchan/event MCP notification method, which existed before the Channels capability shipped. Use this for backward compatibility when the host wires the gateway via ~/.claude.json mcpServers rather than the Claude Code plugin path.

See notify.example.yml for the full annotated configuration reference.

Channel: channels (Claude Code plugin path)

Use when Stack-chan is loaded as a Claude Code plugin and you want events delivered as in-session channel blocks. This channel uses an experimental MCP capability that may evolve.

Enable in notify.yml:

channels:
  enabled: true

Delivered as a <channel ...> block injected into the Claude Code session, for example:

<channel source="plugin:stackchanmcp:stackchanmcp" ...>head was tapped</channel>

Setup:

  1. Plugin installation — install this repository as a Claude Code plugin from the kisaragi-mochi-channels marketplace:

    claude plugin install stackchanmcp@kisaragi-mochi-channels

    For local development against your working copy, pass --plugin-dir /path/to/stackchan-mcp instead of installing from the marketplace; Claude Code starts the gateway under ${CLAUDE_PLUGIN_ROOT}/gateway via the bundled .mcp.json.

  2. Host environment setup — the Channels delivery path requires three host-side names to be aligned with the gateway's MCP server name (stackchanmcp, no hyphen):

    • The plugin's .mcp.json mcpServers key for this gateway must be stackchanmcp. If you previously wired the gateway under a different key, rename it.

    • Claude Code's settings.local.json enabledMcpjsonServers whitelist must include stackchanmcp.

    • The Channels allowlist requires a system-wide approval — Claude Code does not honor user-level (e.g. ~/.claude/settings.json) settings for the Channels allowlist. On macOS, create or edit /Library/Application Support/ClaudeCode/managed-settings.json (requires sudo):

      {
        "channelsEnabled": true,
        "allowedChannelPlugins": ["stackchanmcp@kisaragi-mochi-channels"]
      }
  3. Receiver side — launch Claude Code with the Channels flags:

    claude --channels plugin:stackchanmcp@kisaragi-mochi-channels \
           --dangerously-load-development-channels plugin:stackchanmcp@kisaragi-mochi-channels

    The --channels flag attaches the channel source to the gateway and injects <channel source="plugin:stackchanmcp:stackchanmcp" ...> blocks into the session. The --dangerously-load-development-channels flag is currently required alongside --channels because the plugin's Channels capability is experimental; the approved-allowlist-only path without this flag has been verified not to deliver notifications in current Claude Code versions. The flag is expected to become optional once the plugin's Channels capability stabilizes.

Important — pre-plugin wiring does not receive Channels: if you previously wired this gateway via ~/.claude.json mcpServers (the pre-plugin path), that wiring does not receive <channel ...> injections. Claude Code only attaches a channel source to plugin-loaded MCP servers. Before switching to the plugin path, stop any existing gateway process to release the ESP32 ownership lock; the plugin-loaded gateway will otherwise fail to acquire it. If you prefer to keep the ~/.claude.json wiring, use the legacy_event or jsonl channel below — both work without plugin loading.

Other hosts:

  • Hosts with a claude/channel-compatible receiver: open that receiver per the host's documentation. Compatibility with hosts other than Claude Code has not been verified in this repository.
  • Hosts without a Channels receiver: use the jsonl channel below.
Migration from the previous stackchan-mcp (hyphenated) form

If you enabled Channels using the older stackchan-mcp server-name form, rename to the current stackchanmcp form (no hyphen) in all four places to keep the host MCP client and the gateway aligned:

  • Plugin / server name: change stackchan-mcp to stackchanmcp in your host's .mcp.json mcpServers key, in settings.local.json enabledMcpjsonServers whitelist, and in the --channels / --dangerously-load-development-channels flag arguments.
  • Channels flag form: change from --channels server:stackchan-mcp to --channels plugin:stackchanmcp@kisaragi-mochi-channels. The plugin form is the supported form now that the marketplace manifest is published.
  • system-wide allowlist: ensure /Library/Application Support/ClaudeCode/managed-settings.json has allowedChannelPlugins listing stackchanmcp@kisaragi-mochi-channels (not the old stackchan-mcp form).

Without all four renames the host MCP client logs Channel notifications skipped: server <name> not in --channels list for this session and the notifications never reach the session.

Channel: jsonl (out-of-process file integration)

Use when an external host or your own pipeline needs to ingest events asynchronously by tailing a file. This is the simplest channel to wire into anything that is not Claude Code.

Enable in notify.yml:

jsonl:
  enabled: true
  path: ~/.claude/stackchan-events.jsonl

Each event is appended as one JSON line to the configured path. The file is created if it does not exist, and existing entries are preserved. Each line carries:

  • event_type — top-level event type (currently "touch").
  • subtype — subtype within the event type (currently "tap" or "stroke").
  • duration_ms — firmware-reported duration of the event in milliseconds.
  • ts — firmware uptime in milliseconds (monotonic).
  • ts_unix — wall-clock timestamp when the gateway recorded the event.
  • session_id — gateway session identifier.
  • action — avatar action keyword for the event subtype (e.g. head_pat, head_stroke). The built-in default templates always populate this, and messages: overrides are also required to specify action, so this field is present for every supported subtype.

The rendered wording (e.g. head was tapped) is what the channels channel delivers as the human-readable message; it is not stored in the JSONL record itself. A delivered tap-and-stroke pair looks like:

{"event_type": "touch", "subtype": "tap", "duration_ms": 0, "ts": 123456, "ts_unix": 1717862400.0, "session_id": "abc-123", "action": "head_pat"}
{"event_type": "touch", "subtype": "stroke", "duration_ms": 720, "ts": 124000, "ts_unix": 1717862400.7, "session_id": "abc-123", "action": "head_stroke"}

The top-level event_type and subtype correspond to the rows in "Supported event subtypes" below.

Channel: legacy_event (pre-plugin backward compatibility)

Use when the gateway is wired into the host via ~/.claude.json mcpServers (the pre-plugin path) and you want events without switching the host over to the plugin form.

Enable in notify.yml:

legacy_event:
  enabled: true

The gateway emits the self-defined stackchan/event MCP notification method. The notification params carry the same fields as the JSONL record above, except that ts_unix is not included (it is added only by the JSONL writer). The host is responsible for surfacing the notification — Claude Code's pre-plugin path historically displayed the rendered template inline; other hosts may handle the notification differently.

Supported event subtypes

The currently supported physical events are listed below. The structure is intentionally extensible: additional touch subtypes or new top-level types (e.g. motion, voice) can be appended in future releases without rewriting this section.

Type Subtype Default action Default template
touch tap head_pat head was tapped
touch stroke head_stroke head was stroked for {duration_ms}ms

The built-in defaults are phrased experientially — describing what the device felt rather than naming a mechanical event — so the consuming agent reads them as first-person narration. The {duration_ms} placeholder is substituted from the event payload; unknown placeholders are preserved verbatim.

Overriding the wording

Add a messages: block to ~/.config/stackchan-mcp/notify.yml to override the per-subtype action and template. Only the subtypes you list are overridden; everything else keeps the defaults above.

# ~/.config/stackchan-mcp/notify.yml
messages:
  touch:
    tap:
      action: head_pat
      template: "got a head pat"
    stroke:
      action: head_stroke
      template: "head being stroked for {duration_ms}ms"

Both action and template are required for each overridden subtype. The action value is forwarded in the event metadata, so keep it stable if a downstream consumer keys off it. See notify.example.yml for the full annotated reference.

About the avatar images

firmware/main/boards/stackchan/avatar_images.cc is a pure black RGB565 placeholder. The firmware builds and runs, but the screen will display nothing.

For a personal avatar, keep PNG sources outside git and generate ignored local override files:

cd firmware
python scripts/avatar_convert/convert_avatars.py

By default, the converter reads PNGs from ~/.stackchan/avatar/ and writes:

  • firmware/main/boards/stackchan/avatar_images.local.cc
  • firmware/main/boards/stackchan/avatar_images.local.h

These local files are ignored by git. When avatar_images.local.cc exists, the StackChan firmware build uses it instead of the tracked black placeholder, so git pull will not overwrite your personal avatar.

The tracked avatar_images.cc / avatar_images.h files are public placeholder files. Maintainers who intentionally need to refresh those tracked files can pass --tracked, but personal avatars should use the default local output path.

If you add a local avatar after you have already built the firmware once, remove firmware/build/ and rebuild so CMake can pick up the new local override.

Symbol list (see avatar_images.h):

  • Expressions (6): avatar_idle, avatar_happy, avatar_thinking, avatar_sad, avatar_surprised, avatar_embarrassed
  • Eyes (3): avatar_eyes_open, avatar_eyes_half, avatar_eyes_closed
  • Mouth (5): avatar_mouth_closed, avatar_mouth_half, avatar_mouth_open, avatar_mouth_e, avatar_mouth_u

Expected PNG filenames under ~/.stackchan/avatar/:

  • Expressions: idle.png, happy.png, thinking.png, sad.png, surprised.png, embarrassed.png
  • Eyes: eyes_open.png, eyes_half.png, eyes_closed.png
  • Mouth: mouth_closed.png, mouth_half.png, mouth_open.png, mouth_e.png, mouth_u.png

Do not commit personal PNGs, generated local avatar files, photos, or other user-specific assets.

Hardware safety notes

⚠️ Y-axis (pitch) safe range — two-tier guard

The pitch axis is guarded by two complementary tiers, both encoded in firmware and surfaced through the set_head_angles MCP tool description:

Tier Range Enforcement Rationale
Tier 1 — Hard clamp 0..+88° Silent clamp + ESP_LOGW Prevents mechanical damage. Lower bound leaves ~1° margin above the validated mechanical end-stop on M5Stack CoreS3 + SCS0009 hardware (PR #81). Upper bound 88° sits ~1° inside the audible sub-stall boundary observed at pitch=89° during the Issue #98 on-device sweep ("ji-ji-" gear strain sound).
Tier 2 — Recommended operating range 5..+85° Accept + ESP_LOGI (soft signal) The M5Stack-documented sweet spot for long-term servo reliability. Single requests outside this range are not hardware-damaging, but sustained operation outside 5..85° may stress the servo over time.

M5Stack's official documentation states:

The movement angle of the StackChan Y-axis servo (vertical direction) is recommended to be controlled within 5 ~ 85°. Operating at extreme angles may cause servo stall and permanent damage. — https://docs.m5stack.com/en/StackChan ("Motion Angle Notice")

The set_head_angles MCP tool declares pitch with a fully permissive schema range — the entire int value range, std::numeric_limits<int>::min() to std::numeric_limits<int>::max(), corner values included; the firmware-side handler is the authoritative Tier 1 enforcement layer. Any narrower schema range would cause McpServer::Property to reject sufficiently-extreme out-of-range requests (e.g. pitch=200 or pitch=INT_MIN) before the handler could clamp / log them, leaving the documented Tier 1 behavior unreachable for those callers — see #98. Requests below are silently raised to (with ESP_LOGW), requests above 88° are silently lowered to 88° (with ESP_LOGW), and requests inside [0, 88] but outside [5, 85] are accepted with an ESP_LOGI soft signal. Older callers that targeted -30..+30° continue to work without modification (the negative half clamps to ).

Conversely, the gateway-side move_head MCP tool — the one LLM-driven clients see in the tool list above — declares a restrictive pitch=5..85 / yaw=-90..90 schema and re-enforces the same bounds in the gateway call_tool handler as belt-and-suspenders. This rejects out-of-recommended requests at the MCP boundary so an agent cannot accidentally trigger the bus-hang risk tracked in #100 from a pose-reset call like move_head(yaw=0, pitch=0). Callers that need explicit access to the firmware Tier 1 hard clamp (for diagnostics, recovery sequences, or other expert use cases) should bypass move_head and call the firmware-side set_head_angles device tool directly — see #109.

The X-axis (yaw, -90..+90°) is not subject to a comparable hardware restriction — M5Stack's documentation explicitly notes "No angle restriction is required for the X-axis" — and remains usable across its full declared range.

See #80 for the lower-bound engineering background and #98 for the two-tier upper-bound widening (firmware hard clamp 30°88°, with the M5Stack-recommended 5..85° operating range surfaced as a soft-signal tier).

Known issues

  • The servo bus may hang on large-angle abrupt reversals (e.g. +60° → -60°). A fix is in progress via Motion::update_task interpolation.
  • The touch sensor (Si12T) occasionally drops tap events. Sensitivity register tuning has room to improve here.

License

The canonical firmware build path (firmware/scripts/release.py stackchan) produces an MIT-licensed end-to-end binary. The GPL-3.0 SCServo_lib sources remain in the tree as an opt-in fallback during the migration tracked in #79.

Scope License
gateway/, top-level, all of firmware/ in the canonical build (release.py stackchan appends CONFIG_STACKCHAN_SERVO_FEETECH=y and links the MIT feetech_scs_esp_idf driver vendored under firmware/components/feetech_scs/) MIT License (see LICENSE)
SCServo_lib-derived files under firmware/main/boards/stackchan/ (SCS.{cc,h}, SCSCL.{cc,h}, SCSerial.{cc,h}, INST.h, SCServo.h) — only linked when CONFIG_STACKCHAN_SERVO_SCSCL=y is selected (e.g. via sdkconfig.defaults.local) GNU GPL-3.0 (see firmware/main/boards/stackchan/SCServo_lib_LICENSE.txt)

The gateway/ runs as an independent Python process and only talks to the ESP32 over the network (WebSocket), so it stays usable and derivable under the MIT License regardless of which servo driver the firmware-side build selects.

The gateway's win_amd64 PyPI wheel additionally bundles opus.dll built from upstream Opus source by the publish workflow. That native binary ships under the BSD 3-clause license + Xiph extension; the notice is shipped in every distribution form as gateway/LICENSE-THIRD-PARTY. Non-Windows wheels and the sdist do not contain the binary — they rely on the system libopus. See gateway/stackchan_mcp/_libs/SOURCES.md for the per-release SHA256 and build provenance.

Note for direct idf.py users with a pre-existing firmware/sdkconfig: ESP-IDF persists Kconfig choices into firmware/sdkconfig, and a change to the Kconfig default does not retroactively rewrite that file. The canonical build path enforces the MIT driver at the release.py layer (via sdkconfig_append), so rebuilding via release.py stackchan reliably produces the MIT default binary. If you bypass release.py and call idf.py directly on a workspace that previously selected SCSCL, you may need to run idf.py menuconfig (or delete firmware/sdkconfig) to pick up the new default.

GPL-3.0 fallback build (opt-in): The original SCServo_lib sources are still shipped in the repository as a safety net while the migration tracked in #79 is in its observation period. To build against them, add CONFIG_STACKCHAN_SERVO_SCSCL=y to firmware/sdkconfig.defaults.local; release.py merges this in after sdkconfig_append, so it overrides the FEETECH default. In that configuration the firmware binary statically links the GPL-3.0 sources and is therefore effectively distributed under GPL-3.0. Once the observation period closes without regressions the GPL files are scheduled for removal (Phase B of #79).

upstream

firmware/ is taken in via git subtree from 78/xiaozhi-esp32 (MIT) — specifically the kisaragi-mochi/xiaozhi-esp32 fork. See docs/firmware-sync.md for the upstream sync playbook. The SCServo_lib sources under firmware/main/boards/stackchan/ (SCS.{cc,h}, SCSCL.{cc,h}, SCSerial.{cc,h}, INST.h, SCServo.h) originate from Feetech's SCServo SDK and entered this repository through the same kisaragi-mochi/xiaozhi-esp32 fork's main/boards/stackchan/ directory at the firmware subtree merge. They remain GPL-3.0 (see firmware/main/boards/stackchan/SCServo_lib_LICENSE.txt).

Related projects

Contributing

Issues and PRs are welcome. We aim to provide something the StackChan community can use as-is.

See CONTRIBUTING.md for the development flow.

Trademarks

"StackChan" and "スタックチャン" are registered trademarks of Shinya Ishikawa, the originator of the stack-chan project. This repository uses these names in reference to the M5Stack official StackChan kit it targets.

About

MCP gateway for StackChan (xiaozhi-esp32): bridge any MCP client to a CoreS3 + servo + camera robot

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors