English | 日本語
An MCP (Model Context Protocol) bridge for the M5Stack official StackChan (2025 Kickstarter shipping kit), letting any LLM client drive the device.
Born out of the stack-chan project community (originated by Shinya Ishikawa in 2021). This repository targets the M5Stack official StackChan kit that grew out of that lineage.
┌─────────────┐ stdio MCP ┌──────────────┐ WebSocket MCP ┌──────────────┐
│ MCP client │ ─────────────────▶ │ gateway │ ──────────────────▶ │ ESP32 (CoreS3│
│ (e.g.Claude)│ ◀───────────────── │ (Python) │ ◀────────────────── │ +StackChan) │
└─────────────┘ │ │ └──────────────┘
│ /capture │ ◀── HTTP POST (JPEG) ──┘
└──────────────┘
From any MCP client (Claude Code / Claude Desktop / others) you can call StackChan operations such as head movement, camera capture, touch sensor reads, and avatar expression switches.
This repository is a monorepo.
| Directory | Contents |
|---|---|
firmware/ |
Full git subtree of 78/xiaozhi-esp32. The custom StackChan board lives at firmware/main/boards/stackchan/. |
gateway/ |
Python MCP gateway. stdio MCP server (LLM side) + WebSocket MCP client (ESP32 side) + HTTP capture server. |
docs/ |
architecture.md: full component diagram, tool name mapping, photo flow, auth, phase roadmap. firmware-sync.md: upstream xiaozhi-esp32 sync playbook. remote-access.md: Tailscale Funnel setup for non-LAN use. |
examples/ |
Optional, unmaintained examples. cloudflare-relay/: Cloudflare Workers WebSocket relay for reaching the gateway from outside the local LAN. |
M5Stack official StackChan kit (Kickstarter 2025 shipping version). The firmware in this repository is meant to replace the kit's factory firmware.
| Part | Spec |
|---|---|
| Body | M5Stack CoreS3 (ESP32-S3, 16MB Flash, 8MB PSRAM) |
| Neck servos | SCS0009 ×2 (yaw + pitch, serial bus, TX=GPIO6, RX=GPIO7) |
| Camera | GC0308 (DVP, 320×240) |
| Touch | FT6336 / Si12T |
| Display | ILI9342 (SPI, 320×240) |
A self-built stack-chan (following the original stack-chan project design) may also work as long as the pin assignments and I2C addresses match. Reports and PRs welcome.
| Tool | Description | Status |
|---|---|---|
get_status |
Gateway connection state | ✅ |
get_device_info |
ESP32 device state (battery / volume / WiFi / etc.) | ✅ |
take_photo(question?) |
Capture a frame, save as JPEG, return the path | ✅ |
set_volume(volume) |
Speaker volume (0-100) | ✅ |
set_brightness(brightness) |
Screen brightness (0-100) | ✅ |
move_head(yaw, pitch, speed?) |
Move the neck (servos). pitch is constrained to 5..85 — the M5Stack-recommended operating range. For the wider firmware hard clamp (0..88), use the firmware-side set_head_angles device tool instead. |
✅ |
get_touch_state |
Touch sensor state (press / release / stroke / etc.) | ✅ |
get_touch_sensor_enabled |
Read whether head-touch detection is enabled. The NVS-backed setting persists across reboot. | ✅ |
set_touch_sensor_enabled(enabled) |
Enable or disable head-touch detection. Disabling stops both the firmware local motion response and MCP stackchan/event emission, and persists across reboot. |
✅ |
set_avatar(face) |
Switch avatar expression (idle / happy / thinking / sad / surprised / embarrassed), or off to hide the avatar and disable blink so the underlying WiFi config / OTA / settings screens are visible. Any other face brings the avatar back and restores blink. |
✅ |
set_blink(state) |
Blink on/off | ✅ |
set_mouth(state) |
Mouth open/close (one-shot, held until next call) | ✅ |
set_mouth_sequence(steps) |
Queue and play a list of {shape, duration_ms} steps locally for TTS lip-sync — no per-step WebSocket RTT jitter |
✅ |
check_vm_en |
Check servo power supply (VM EN HIGH) state | ✅ |
set_led(index, r, g, b) |
Set one of the 12 base RGB LEDs (index 0..11, channels 0..255) |
✅ |
set_all_leds(r, g, b) |
Set all 12 base RGB LEDs to the same color | ✅ |
set_leds(colors) |
Batch-set the first N LEDs from a [[r,g,b], ...] array in a single I2C burst (use this for animations / multi-color patterns); trailing LEDs keep their previous color |
✅ |
clear_leds |
Turn all 12 base RGB LEDs off | ✅ |
say(text, voice?, speaker_id?, reference_audio?) |
Speak text on the device speaker via gateway-side TTS. A supported expression emoji in the text can switch the avatar face in the same call. Default engine: VOICEVOX (runs as a separate HTTP service — see TTS setup). Requires the [tts] extra. |
✅ |
listen(duration_ms?, engine?, language?, model?, motion?, look_up_pitch?) |
Capture a short utterance from the device microphone and transcribe it via gateway-side STT. Default engine: faster-whisper (local, MIT) — see STT setup. Optional motion feedback can show the thinking face or tilt the head up during capture. Requires the [stt-faster-whisper] (or [stt-openai]) extra and a firmware update with the inbound listen wire type. |
✅ |
stackchan_follow_pose_stream(action, url, ...) |
Subscribe to an arbitrary external WebSocket pose-stream and drive the head to follow incoming yaw / pitch frames 1:1 within the SCS0009 working range. action switches between start / stop / status. Includes per-axis flip, pitch-center offset, downsample rate cap, angular-velocity clamp, and reconnect with exponential backoff; the initial pose is seeded from the device so the angular-velocity clamp is anchored at the real servo position from the first frame. The upstream server's protocol (zero-offset commands, source dispatching, transport) is intentionally outside this gateway's scope. |
✅ |
See gateway/README.md for full schemas.
There are two paths. Option A is recommended for first-time users — no toolchain setup needed. Option B is for contributors who want to build from source.
Download the latest firmware bundle from the Releases page — pick the most recent firmware-v* release and grab merged-binary.bin (and optionally xiaozhi.bin). Then flash with esptool.py:
# Replace --port with your platform's serial device:
# macOS: /dev/cu.usbmodem* (e.g. /dev/cu.usbmodem1101)
# Linux: /dev/ttyUSB0 or /dev/ttyACM0
# Windows: COM3 (or whichever it shows up as in Device Manager)
# Clean install (resets NVS — Wi-Fi settings will need to be re-entered):
esptool.py --chip esp32s3 --port /dev/cu.usbmodem1101 -b 460800 \
write_flash 0x0 merged-binary.bin
# Or, app-only update (preserves NVS — keeps your Wi-Fi setup):
esptool.py --chip esp32s3 --port /dev/cu.usbmodem1101 -b 460800 \
write_flash 0x20000 xiaozhi.binNo ESP-IDF or Docker setup needed.
This repository uses git submodules under firmware/components/. If you
cloned without --recursive, initialize them first:
git submodule update --init --recursiveThen build:
cd firmware
docker run --rm --cpus=4 --ulimit nofile=65536:65536 \
-v $PWD:/project -w /project espressif/idf:v5.5.2 \
python ./scripts/release.py stackchan
# → releases/v2.2.6_stackchan.zip
# Flash (after USB-connecting the CoreS3).
# Replace --port with your platform's serial device — see the Option A
# note above for the macOS/Linux/Windows mapping.
esptool.py --chip esp32s3 --port /dev/cu.usbmodem1101 -b 460800 \
write_flash 0x0 build/merged-binary.binThe --cpus=4 flag caps Docker container parallelism so the concurrent
LVGL / xiaozhi-fonts/emoji_*.c compile steps stay within the memory
budget on macOS Docker hosts (OrbStack / Docker Desktop). Without it,
ninja autodetects job count from /proc/cpuinfo and the resulting
parallel gcc pressure can exhaust container memory mid-LVGL with
Cannot allocate memory — even on hosts with ample physical RAM
(tracked as #112). The --ulimit nofile=65536:65536 flag separately
avoids a Too many open files failure during the same LVGL compile
step under the default file-descriptor limit. Linux hosts with higher
defaults are unaffected, but passing both flags unconditionally is
safe and matches CI.
After flashing, WiFi configuration happens on first boot — connect from a smartphone to the setup UI (the xiaozhi-esp32 standard flow).
On a local network, the gateway advertises _stackchan-mcp._tcp.local.
by default. Fresh firmware can use that mDNS/DNS-SD record to find the
WebSocket endpoint automatically when no primary URL has been saved yet. mDNS
discovery requires NVS websocket.url to be empty.
Primary URL resolution order:
- NVS
websocket.url - mDNS
_stackchan-mcp._tcp.local.whenCONFIG_STACKCHAN_MDNS_DISCOVERYis enabled and the primary NVS URL is empty CONFIG_DEFAULT_WEBSOCKET_URL- Empty/fail with a boot log error
Existing websocket.fallback_url and
CONFIG_DEFAULT_WEBSOCKET_FALLBACK_URL candidates are still tried after the
primary candidate path above. CONFIG_FORCE_DEFAULT_WEBSOCKET_URL=y remains
an explicit exception for stale-NVS recovery: a non-empty Kconfig URL wins and
mDNS discovery is skipped.
The gateway advertises mDNS by default; run stackchan-mcp --no-mdns to
disable advertisement. To compile out firmware discovery, set
CONFIG_STACKCHAN_MDNS_DISCOVERY=n. Discovery requires UDP multicast on the
local LAN, and some routers or VLANs block it. When multiple gateways are
visible, the firmware tries each usable IPv4 address from every supported
gateway service discovered in one browse, and logs the accepted instance count
and candidate address list. mDNS only discovers the URL;
websocket.token / CONFIG_DEFAULT_WEBSOCKET_TOKEN still control
authentication.
For LAN auto-discovery with an automatic out-of-LAN relay fallback, keep NVS
websocket.url empty and store the relay URL in websocket.fallback_url. With
that profile, mDNS candidates are tried first, and the relay is tried later if
the local discovery path does not complete the WebSocket server hello flow.
Automatic recovery after a gateway host IPv4 change requires both sides of the paired mDNS fix: Gateway vA.B.C+ refreshes the advertised service when the host address changes, and Firmware vX.Y.Z+ tries all supported mDNS service instances from a browse. Earlier firmware may keep trying a stale cached instance until reboot.
The firmware reads these NVS keys for the gateway connection:
websocket.url— the gateway WebSocket URL (e.g.ws://<gateway-host>:8765/)websocket.fallback_url— optional second gateway URL to try whenwebsocket.urlcannot be reached or does not complete the server hello flowwebsocket.token— the bearer token sent asAuthorization: Bearer <token>, matched againstSTACKCHAN_TOKEN/BEARER_TOKENon the gateway side (leave both empty to skip authentication entirely)
There are four practical ways to provide them, plus one temporary source-level escape hatch:
-
Build-time defaults via Kconfig (recommended for developers): run
idf.py menuconfig→Component config→Xiaozhi Assistant, and set:Default WebSocket gateway URL (fallback when NVS is empty)→CONFIG_DEFAULT_WEBSOCKET_URL(e.g.ws://<gateway-host>:8765/)Fallback WebSocket gateway URL→CONFIG_DEFAULT_WEBSOCKET_FALLBACK_URLDefault WebSocket auth token (fallback when NVS is empty)→CONFIG_DEFAULT_WEBSOCKET_TOKEN(leave empty if your gateway accepts unauthenticated connections)
By default these only apply when the corresponding NVS key is empty. For first-time flashes onto a fresh device this is exactly what you want. If both a primary and fallback URL are configured, the firmware tries them in deterministic order and keeps the first candidate that completes the WebSocket server hello flow.
-
Use the on-device WiFi config UI (recommended for fresh users): while the device is in WiFi configuration mode, open the captive portal at
http://192.168.4.1, switch to the Advanced tab, and fill in:- WebSocket Gateway URL (e.g.
ws://<gateway-host>:8765/) — the primary gateway candidate. - Fallback Gateway URL (e.g.
wss://<relay-host>/) — optional second candidate, tried only after the primary candidate fails the server-hello flow. - Gateway Token — optional bearer token, sent as
Authorization: Bearer <token>to both candidates when set. The current value is never displayed (the WiFi config AP is unauthenticated, so the GET endpoint reports only whether a token is configured). Leave the field blank to keep the existing token, type a new value to update it, or hit ❌ to fall back to the build-timeCONFIG_DEFAULT_WEBSOCKET_TOKEN. On stock builds where no Kconfig default is set this disables auth; on builds that ship a default token, ❌ reverts to that default rather than truly clearing authentication. To switch the device to an unauthenticated gateway on a build that ships a default token, rebuild the firmware with the Kconfig default empty (or set a non-empty token on the gateway side that matches the build default).
Submit to persist the values to the
websocketNVS namespace (websocket.url/websocket.fallback_url/websocket.token); they are read on the next boot. This is the intended path for end users running a pre-built firmware. Clearing a URL field with the ❌ button and submitting again falls back to the matchingCONFIG_DEFAULT_WEBSOCKET_*Kconfig value (or to "no fallback" when no Kconfig default is set). - WebSocket Gateway URL (e.g.
-
Use the runtime MCP tools over an established connection:
gateway_config_getreportswebsocket.url,websocket.fallback_url, whether a token is set,force_mode,discovery_enabled, and the connected candidate URL without exposing the token value.gateway_config_setaccepts optionalurl,fallback_url, andtokenstrings. Passing an empty string clears that NVS key; for exampleurl=""restores the mDNS discovery path on the next reconnect. Changes are persisted immediately but do not trigger a disconnect, reconnect, or reboot. -
Write
websocket.url/websocket.fallback_url/websocket.tokendirectly to NVS (advanced): for example with a custom NVS-write tool over serial. Same persistence semantics as the WiFi config UI and runtime MCP tools; primarily useful for batch provisioning. -
Temporary source hardcode (not recommended): editing
websocket_protocol.cccan unblock local experiments, but keep it out of commits.
Common gateway URL setups:
| Mode | Primary URL | Fallback URL |
|---|---|---|
| LAN auto-discovery with relay fallback | empty (mDNS) | wss://<relay-host>/ |
| Fixed local only | ws://<gateway-host>:8765/ |
empty |
| Relay only | wss://<relay-host>/ |
empty |
| Fixed local with relay fallback | ws://<gateway-host>:8765/ |
wss://<relay-host>/ |
If you are flashing onto a device that previously ran upstream xiaozhi-esp32
firmware, NVS will already contain websocket.url=wss://api.tenclass.net/...
written by the upstream OTA-config path. In this case the empty-NVS fallback
in option 1 above will not trigger, and the device will keep trying to
talk to tenclass instead of your local gateway.
If the device can still establish a stackchan-mcp connection through any
candidate, call gateway_config_set with url="" to clear the stale primary
NVS URL and restore the mDNS discovery path on the next reconnect. If the stale
URL prevents any stackchan-mcp connection from being established, use the
force-override switch without erasing all of NVS (which would also drop WiFi
credentials):
Force CONFIG_DEFAULT_WEBSOCKET_URL/TOKEN to override NVS→CONFIG_FORCE_DEFAULT_WEBSOCKET_URL=y
When set, non-empty Kconfig URL/token values override whatever NVS holds.
Empty Kconfig values still fall through to the NVS-based behaviour, so leaving
the token Kconfig empty keeps any NVS-stored token in use. The boot log will
show FORCE: overriding NVS websocket.url with Kconfig: NVS=... -> ... so you
can verify the override fired. This switch is the recommended way to bring
ex-xiaozhi hardware onto a local stackchan-mcp gateway, and to lock CI/dev
images to a known gateway URL.
The switch is opt-in so end-user devices configured at runtime keep their NVS-priority semantics.
For local hardware testing, do not put personal gateway URLs or tokens in the
tracked firmware/sdkconfig.defaults. Instead, create a gitignored local file:
cd firmware
cat > sdkconfig.defaults.local <<'EOF'
CONFIG_DEFAULT_WEBSOCKET_URL="ws://<your-lan-ip>:8765/"
CONFIG_DEFAULT_WEBSOCKET_FALLBACK_URL="wss://<relay-host>/"
CONFIG_DEFAULT_WEBSOCKET_TOKEN="<your-dev-token>"
CONFIG_FORCE_DEFAULT_WEBSOCKET_URL=y
EOFBoth python ./scripts/release.py <board> and plain idf.py build will read
this file when it exists. The file is ignored by git, so personal settings
cannot be added accidentally with git add -A.
The gateway can either be installed as the published PyPI package
(recommended for end users) or run from this repository as a checkout
(recommended for contributors who want to follow main).
For an isolated install that does not collide with your system Python or other Python projects, use one of:
uv tool install stackchan-mcp
# or
pipx install stackchan-mcpThen run the gateway:
stackchan-mcpIf you prefer a project-managed virtualenv, pip install stackchan-mcp
inside an active venv works as well, and python -m stackchan_mcp
inside that venv is equivalent to stackchan-mcp. Just avoid
pip install against the system Python (PEP 668).
The STACKCHAN_TOKEN, VISION_HOST, and other settings documented in
gateway/README.md can be supplied via environment
variables, the active shell, or a .env file in the working directory.
cd gateway
cp .env.example .env # set STACKCHAN_TOKEN / VISION_HOST
uv sync
uv run python -m stackchan_mcpIf the gateway is restarted while the ESP32 is already connected, the firmware
automatically retries the WebSocket connection while idle. The retry delay starts
at 5 seconds and backs off up to 60 seconds; use get_status to confirm that
the device has reappeared. The same retry path also fires for any
post-handshake server-initiated close — gateway crashes, TLS-layer resets, and
gateway configurations that tear the WebSocket session down after the handshake
— so the device recovers automatically once the gateway accepts the next
connection attempt.
For non-LAN setups, see docs/remote-access.md for the
Tailscale Funnel flow and the VISION_URL capture callback setting.
Add to ~/.claude.json.
If you installed via pip install stackchan-mcp:
{
"mcpServers": {
"stackchan-mcp": {
"type": "stdio",
"command": "stackchan-mcp",
"env": {
"STACKCHAN_TOKEN": "your-secret-token-here",
"VISION_HOST": "your.host.lan.ip"
}
}
}
}If you installed from source via uv:
{
"mcpServers": {
"stackchan-mcp": {
"type": "stdio",
"command": "uv",
"args": [
"run", "--directory", "/path/to/stackchan-mcp/gateway",
"python", "-m", "stackchan_mcp"
]
}
}
}See gateway/README.md for details.
The Python gateway can optionally read user-local default argument values from a TOML file under the OS-standard user config directory:
- Linux / XDG:
~/.config/stackchan-mcp/user-defaults.toml - macOS:
~/Library/Application Support/stackchan-mcp/user-defaults.toml - Windows:
%APPDATA%\stackchan-mcp\user-defaults.toml
The exact path is resolved with platformdirs, and the gateway logs the
resolved path at startup. The tracked user-defaults.toml.example file at the
repository root is the template.
The file is an overlay on top of the MCP schema defaults. If the file is absent or empty, schema defaults are used. If a key is omitted from the file, that key keeps its schema default. Explicit MCP tool-call arguments always take precedence over the file, so the order is: explicit argument > user-defaults file > schema default.
This is a gateway-side Python setting. It is separate from the firmware-side
NVS connection settings exposed by gateway_config_get / gateway_config_set
(added in PR #293). Those tools read or update the device's saved WebSocket
connection settings; this TOML file only changes default MCP argument values
used by the Python gateway.
Minimal example:
[tool.stackchan_follow_pose_stream]
smoothing_window = 1
downsample_hz = 20
max_step_deg = 30To make the device speak, install the [tts] extra and run a
VOICEVOX engine alongside the
gateway. VOICEVOX runs as a separate HTTP service, so its LGPL-3.0
license stays scoped to that process — the MIT-licensed gateway only
issues HTTP requests against it.
docker run --rm -p '127.0.0.1:50021:50021' \
voicevox/voicevox_engine:cpu-ubuntu20.04-latestThe engine listens on port 50021 by default. The CPU image is fine for short sentences; GPU images are also published upstream.
pip install 'stackchan-mcp[tts]'
# or, equivalently:
pip install 'stackchan-mcp[tts-voicevox]'This pulls in httpx (HTTP client) and opuslib (Opus encoder
bindings). The encoder needs libopus on the system —
brew install opus on macOS, sudo apt-get install libopus0 on
Debian/Ubuntu.
| Environment variable | Default | Notes |
|---|---|---|
STACKCHAN_VOICEVOX_URL |
http://127.0.0.1:50021 |
VOICEVOX engine URL. |
STACKCHAN_VOICEVOX_DEFAULT_SPEAKER |
3 |
Default speaker ID (Zundamon normal). See the VOICEVOX speaker list for other options. |
From an MCP client:
say(text="こんにちは、わたしはスタックチャンです")
The gateway POSTs to VOICEVOX, decodes the returned WAV, resamples to
16 kHz mono, encodes Opus frames (60 ms each), and pushes them as
WebSocket binary frames to the device — which decodes and plays them
through its speaker. No firmware changes are required: the
existing audio decoder pipeline already accepts these frames. The
TTS framework is engine-agnostic, so additional engines plug in behind
the same say API — see the Irodori engine below.
say also accepts supported expression emoji in the text: happy
(😊 😄 😀 😁 🙂 😆 🥰 😍 😋 🤗), sad (😢 😭 😞 😔
Irodori is a second TTS engine that calls an external synthesis service
returning MP3. It plugs into the same say pipeline as VOICEVOX (MP3 is
decoded to 16 kHz mono PCM, then encoded to Opus), and it receives
emoji embedded in the input text verbatim so they can act as voice-style
cues.
There is no default endpoint — the synthesis backend is a hosted service, and shipping a hard-coded URL would point every install at someone else's deployment. You must self-host a compatible synthesis API (for example, duplicate the reference Hugging Face Space, or run your own service that honours the same request/response contract) and point the gateway at it.
Install the extra:
pip install 'stackchan-mcp[tts-irodori]'This pulls in httpx, opuslib, and miniaudio (a small self-contained
MP3/audio decoder with prebuilt wheels, so no extra system library is
needed beyond the libopus that opuslib already requires).
Configure via environment variables (URL and key are read from the environment only — never commit them):
| Environment variable | Default | Notes |
|---|---|---|
STACKCHAN_IRODORI_URL |
(required) | Synthesis endpoint URL of your self-hosted service. Unset → the engine still lists but say(voice="irodori") returns a clear error. |
STACKCHAN_IRODORI_KEY |
(none) | Optional API key, forwarded as a query parameter. |
STACKCHAN_IRODORI_SPEAKER |
3 |
Default speaker identifier. |
STACKCHAN_IRODORI_STEPS |
24 |
Default diffusion step count (higher = slower, higher quality). |
Select Irodori per call:
say(text="やったね😊", voice="irodori")
Or make it the default engine for every say call that omits voice:
export STACKCHAN_TTS_ENGINE=irodoriVOICEVOX remains the default when STACKCHAN_TTS_ENGINE is unset, and an
explicit voice argument always overrides the default.
To let the device hear, install one of the [stt-*] extras and pair
it with a firmware that supports the inbound listen wire type
(present from this release onward). The gateway sends a listen.start
notification, the firmware opens the microphone, and the inbound
Opus frames are decoded and transcribed locally — no audio leaves
your machine when you use the default faster-whisper engine.
For local transcription with faster-whisper (MIT-licensed, CTranslate2-based, runs on CPU):
pip install 'stackchan-mcp[stt-faster-whisper]'For the OpenAI Whisper API (cloud, useful when local compute is constrained):
pip install 'stackchan-mcp[stt-openai]'
export OPENAI_API_KEY=sk-...Both extras include the base [stt] extra which pulls in opuslib
for decoding the inbound frames. The decoder needs the system
libopus library — brew install opus on macOS,
sudo apt-get install libopus0 on Debian/Ubuntu (this is the same
prerequisite as the [tts] extra).
| Environment variable | Default | Notes |
|---|---|---|
STACKCHAN_FASTER_WHISPER_MODEL |
base |
Model identifier — tiny / base / small / medium / large-v3. Larger models are more accurate but slower and use more memory. |
STACKCHAN_FASTER_WHISPER_DEVICE |
cpu |
cpu / cuda / auto. |
STACKCHAN_FASTER_WHISPER_COMPUTE_TYPE |
int8 |
int8 / float16 / float32. |
STACKCHAN_OPENAI_WHISPER_MODEL |
whisper-1 |
OpenAI Whisper model identifier (only whisper-1 is currently exposed by the API). |
From an MCP client:
listen(duration_ms=5000, language="ja")
The gateway sends {"type":"listen","state":"start","mode":"manual"}
to the device, buffers the Opus frames the device streams up for the
capture window, then sends {"type":"listen","state":"stop"} and
hands the buffered audio to the registered STT engine. The first call
to the faster-whisper engine downloads the chosen model (~140 MB
for base) into the Hugging Face cache; subsequent calls reuse it.
For visible capture feedback, pass motion="face-only" to show the
thinking avatar during capture and restore idle at the end, or
motion="look-up" to preserve yaw, tilt pitch to look_up_pitch
(default 50°, valid 5..85°), show thinking, and hold that pose on
success. The STT framework is engine-agnostic — additional engines
(Vosk, whisper.cpp, cloud providers) can be added without changing
the listen API.
Stack-chan physical events (today: touch tap / stroke; the structure
allows additional subtypes to be appended later) can be delivered
through three notification channels. All channels are disabled by
default; opt in via ~/.config/stackchan-mcp/notify.yml. More than
one channel can be enabled at the same time; each event is then
delivered over every enabled channel. Pick whichever match the host
you are integrating with:
channels— Claude Code plugin path. Notifications are injected into the running session as<channel ...>blocks via Claude Code's experimental Channels capability. Use this when Stack-chan is wired into Claude Code as an installed plugin and you want events to reach the conversation in-band.jsonl— Out-of-process file integration. Each event is appended as a single JSON line to a file you configure. Use this when an external host (anything other than Claude Code, or your own pipeline) needs to ingest events asynchronously by tailing a file.legacy_event— Pre-plugin MCP notification. The gateway emits the self-definedstackchan/eventMCP notification method, which existed before the Channels capability shipped. Use this for backward compatibility when the host wires the gateway via~/.claude.jsonmcpServersrather than the Claude Code plugin path.
See notify.example.yml for the full annotated configuration
reference.
Use when Stack-chan is loaded as a Claude Code plugin and you want events delivered as in-session channel blocks. This channel uses an experimental MCP capability that may evolve.
Enable in notify.yml:
channels:
enabled: trueDelivered as a <channel ...> block injected into the Claude Code
session, for example:
<channel source="plugin:stackchanmcp:stackchanmcp" ...>head was tapped</channel>
Setup:
-
Plugin installation — install this repository as a Claude Code plugin from the
kisaragi-mochi-channelsmarketplace:claude plugin install stackchanmcp@kisaragi-mochi-channels
For local development against your working copy, pass
--plugin-dir /path/to/stackchan-mcpinstead of installing from the marketplace; Claude Code starts the gateway under${CLAUDE_PLUGIN_ROOT}/gatewayvia the bundled.mcp.json. -
Host environment setup — the Channels delivery path requires three host-side names to be aligned with the gateway's MCP server name (
stackchanmcp, no hyphen):-
The plugin's
.mcp.jsonmcpServerskey for this gateway must bestackchanmcp. If you previously wired the gateway under a different key, rename it. -
Claude Code's
settings.local.jsonenabledMcpjsonServerswhitelist must includestackchanmcp. -
The Channels allowlist requires a system-wide approval — Claude Code does not honor user-level (e.g.
~/.claude/settings.json) settings for the Channels allowlist. On macOS, create or edit/Library/Application Support/ClaudeCode/managed-settings.json(requiressudo):{ "channelsEnabled": true, "allowedChannelPlugins": ["stackchanmcp@kisaragi-mochi-channels"] }
-
-
Receiver side — launch Claude Code with the Channels flags:
claude --channels plugin:stackchanmcp@kisaragi-mochi-channels \ --dangerously-load-development-channels plugin:stackchanmcp@kisaragi-mochi-channelsThe
--channelsflag attaches the channel source to the gateway and injects<channel source="plugin:stackchanmcp:stackchanmcp" ...>blocks into the session. The--dangerously-load-development-channelsflag is currently required alongside--channelsbecause the plugin's Channels capability is experimental; the approved-allowlist-only path without this flag has been verified not to deliver notifications in current Claude Code versions. The flag is expected to become optional once the plugin's Channels capability stabilizes.
Important — pre-plugin wiring does not receive Channels: if you
previously wired this gateway via ~/.claude.json mcpServers (the
pre-plugin path), that wiring does not receive <channel ...>
injections. Claude Code only attaches a channel source to
plugin-loaded MCP servers. Before switching to the plugin path, stop
any existing gateway process to release the ESP32 ownership lock; the
plugin-loaded gateway will otherwise fail to acquire it. If you prefer
to keep the ~/.claude.json wiring, use the legacy_event or jsonl
channel below — both work without plugin loading.
Other hosts:
- Hosts with a
claude/channel-compatible receiver: open that receiver per the host's documentation. Compatibility with hosts other than Claude Code has not been verified in this repository. - Hosts without a Channels receiver: use the
jsonlchannel below.
If you enabled Channels using the older stackchan-mcp server-name
form, rename to the current stackchanmcp form (no hyphen) in all four
places to keep the host MCP client and the gateway aligned:
- Plugin / server name: change
stackchan-mcptostackchanmcpin your host's.mcp.jsonmcpServerskey, insettings.local.jsonenabledMcpjsonServerswhitelist, and in the--channels/--dangerously-load-development-channelsflag arguments. - Channels flag form: change from
--channels server:stackchan-mcpto--channels plugin:stackchanmcp@kisaragi-mochi-channels. The plugin form is the supported form now that the marketplace manifest is published. - system-wide allowlist: ensure
/Library/Application Support/ClaudeCode/managed-settings.jsonhasallowedChannelPluginslistingstackchanmcp@kisaragi-mochi-channels(not the oldstackchan-mcpform).
Without all four renames the host MCP client logs
Channel notifications skipped: server <name> not in --channels list for this session and the notifications never reach the session.
Use when an external host or your own pipeline needs to ingest events asynchronously by tailing a file. This is the simplest channel to wire into anything that is not Claude Code.
Enable in notify.yml:
jsonl:
enabled: true
path: ~/.claude/stackchan-events.jsonlEach event is appended as one JSON line to the configured path. The file is created if it does not exist, and existing entries are preserved. Each line carries:
event_type— top-level event type (currently"touch").subtype— subtype within the event type (currently"tap"or"stroke").duration_ms— firmware-reported duration of the event in milliseconds.ts— firmware uptime in milliseconds (monotonic).ts_unix— wall-clock timestamp when the gateway recorded the event.session_id— gateway session identifier.action— avatar action keyword for the event subtype (e.g.head_pat,head_stroke). The built-in default templates always populate this, andmessages:overrides are also required to specifyaction, so this field is present for every supported subtype.
The rendered wording (e.g. head was tapped) is what the channels
channel delivers as the human-readable message; it is not stored in
the JSONL record itself. A delivered tap-and-stroke pair looks like:
{"event_type": "touch", "subtype": "tap", "duration_ms": 0, "ts": 123456, "ts_unix": 1717862400.0, "session_id": "abc-123", "action": "head_pat"}
{"event_type": "touch", "subtype": "stroke", "duration_ms": 720, "ts": 124000, "ts_unix": 1717862400.7, "session_id": "abc-123", "action": "head_stroke"}The top-level event_type and subtype correspond to the rows in
"Supported event subtypes" below.
Use when the gateway is wired into the host via ~/.claude.json
mcpServers (the pre-plugin path) and you want events without
switching the host over to the plugin form.
Enable in notify.yml:
legacy_event:
enabled: trueThe gateway emits the self-defined stackchan/event MCP notification
method. The notification params carry the same fields as the JSONL
record above, except that ts_unix is not included (it is added only
by the JSONL writer). The host is responsible for surfacing the
notification — Claude Code's pre-plugin path historically displayed
the rendered template inline; other hosts may handle the notification
differently.
The currently supported physical events are listed below. The structure
is intentionally extensible: additional touch subtypes or new
top-level types (e.g. motion, voice) can be appended in future
releases without rewriting this section.
| Type | Subtype | Default action |
Default template |
|---|---|---|---|
touch |
tap |
head_pat |
head was tapped |
touch |
stroke |
head_stroke |
head was stroked for {duration_ms}ms |
The built-in defaults are phrased experientially — describing what the
device felt rather than naming a mechanical event — so the consuming
agent reads them as first-person narration. The {duration_ms}
placeholder is substituted from the event payload; unknown
placeholders are preserved verbatim.
Add a messages: block to ~/.config/stackchan-mcp/notify.yml to
override the per-subtype action and template. Only the subtypes
you list are overridden; everything else keeps the defaults above.
# ~/.config/stackchan-mcp/notify.yml
messages:
touch:
tap:
action: head_pat
template: "got a head pat"
stroke:
action: head_stroke
template: "head being stroked for {duration_ms}ms"Both action and template are required for each overridden subtype.
The action value is forwarded in the event metadata, so keep it
stable if a downstream consumer keys off it. See notify.example.yml
for the full annotated reference.
firmware/main/boards/stackchan/avatar_images.cc is a pure black RGB565 placeholder. The firmware builds and runs, but the screen will display nothing.
For a personal avatar, keep PNG sources outside git and generate ignored local override files:
cd firmware
python scripts/avatar_convert/convert_avatars.pyBy default, the converter reads PNGs from ~/.stackchan/avatar/ and writes:
firmware/main/boards/stackchan/avatar_images.local.ccfirmware/main/boards/stackchan/avatar_images.local.h
These local files are ignored by git. When avatar_images.local.cc exists, the StackChan firmware build uses it instead of the tracked black placeholder, so git pull will not overwrite your personal avatar.
The tracked avatar_images.cc / avatar_images.h files are public placeholder files. Maintainers who intentionally need to refresh those tracked files can pass --tracked, but personal avatars should use the default local output path.
If you add a local avatar after you have already built the firmware once, remove firmware/build/ and rebuild so CMake can pick up the new local override.
Symbol list (see avatar_images.h):
- Expressions (6):
avatar_idle,avatar_happy,avatar_thinking,avatar_sad,avatar_surprised,avatar_embarrassed - Eyes (3):
avatar_eyes_open,avatar_eyes_half,avatar_eyes_closed - Mouth (5):
avatar_mouth_closed,avatar_mouth_half,avatar_mouth_open,avatar_mouth_e,avatar_mouth_u
Expected PNG filenames under ~/.stackchan/avatar/:
- Expressions:
idle.png,happy.png,thinking.png,sad.png,surprised.png,embarrassed.png - Eyes:
eyes_open.png,eyes_half.png,eyes_closed.png - Mouth:
mouth_closed.png,mouth_half.png,mouth_open.png,mouth_e.png,mouth_u.png
Do not commit personal PNGs, generated local avatar files, photos, or other user-specific assets.
⚠️ Y-axis (pitch) safe range — two-tier guard
The pitch axis is guarded by two complementary tiers, both encoded in firmware and surfaced through the set_head_angles MCP tool description:
| Tier | Range | Enforcement | Rationale |
|---|---|---|---|
| Tier 1 — Hard clamp | 0..+88° |
Silent clamp + ESP_LOGW |
Prevents mechanical damage. Lower bound 0° leaves ~1° margin above the validated mechanical end-stop on M5Stack CoreS3 + SCS0009 hardware (PR #81). Upper bound 88° sits ~1° inside the audible sub-stall boundary observed at pitch=89° during the Issue #98 on-device sweep ("ji-ji-" gear strain sound). |
| Tier 2 — Recommended operating range | 5..+85° |
Accept + ESP_LOGI (soft signal) |
The M5Stack-documented sweet spot for long-term servo reliability. Single requests outside this range are not hardware-damaging, but sustained operation outside 5..85° may stress the servo over time. |
M5Stack's official documentation states:
The movement angle of the StackChan Y-axis servo (vertical direction) is recommended to be controlled within 5 ~ 85°. Operating at extreme angles may cause servo stall and permanent damage. — https://docs.m5stack.com/en/StackChan ("Motion Angle Notice")
The set_head_angles MCP tool declares pitch with a fully permissive schema range — the entire int value range, std::numeric_limits<int>::min() to std::numeric_limits<int>::max(), corner values included; the firmware-side handler is the authoritative Tier 1 enforcement layer. Any narrower schema range would cause McpServer::Property to reject sufficiently-extreme out-of-range requests (e.g. pitch=200 or pitch=INT_MIN) before the handler could clamp / log them, leaving the documented Tier 1 behavior unreachable for those callers — see #98. Requests below 0° are silently raised to 0° (with ESP_LOGW), requests above 88° are silently lowered to 88° (with ESP_LOGW), and requests inside [0, 88] but outside [5, 85] are accepted with an ESP_LOGI soft signal. Older callers that targeted -30..+30° continue to work without modification (the negative half clamps to 0°).
Conversely, the gateway-side move_head MCP tool — the one LLM-driven clients see in the tool list above — declares a restrictive pitch=5..85 / yaw=-90..90 schema and re-enforces the same bounds in the gateway call_tool handler as belt-and-suspenders. This rejects out-of-recommended requests at the MCP boundary so an agent cannot accidentally trigger the bus-hang risk tracked in #100 from a pose-reset call like move_head(yaw=0, pitch=0). Callers that need explicit access to the firmware Tier 1 hard clamp (for diagnostics, recovery sequences, or other expert use cases) should bypass move_head and call the firmware-side set_head_angles device tool directly — see #109.
The X-axis (yaw, -90..+90°) is not subject to a comparable hardware restriction — M5Stack's documentation explicitly notes "No angle restriction is required for the X-axis" — and remains usable across its full declared range.
See #80 for the lower-bound engineering background and #98 for the two-tier upper-bound widening (firmware hard clamp 30° → 88°, with the M5Stack-recommended 5..85° operating range surfaced as a soft-signal tier).
- The servo bus may hang on large-angle abrupt reversals (e.g. +60° → -60°). A fix is in progress via Motion::update_task interpolation.
- The touch sensor (Si12T) occasionally drops tap events. Sensitivity register tuning has room to improve here.
The canonical firmware build path (firmware/scripts/release.py stackchan) produces an MIT-licensed end-to-end binary. The GPL-3.0 SCServo_lib sources remain in the tree as an opt-in fallback during the migration tracked in #79.
| Scope | License |
|---|---|
gateway/, top-level, all of firmware/ in the canonical build (release.py stackchan appends CONFIG_STACKCHAN_SERVO_FEETECH=y and links the MIT feetech_scs_esp_idf driver vendored under firmware/components/feetech_scs/) |
MIT License (see LICENSE) |
SCServo_lib-derived files under firmware/main/boards/stackchan/ (SCS.{cc,h}, SCSCL.{cc,h}, SCSerial.{cc,h}, INST.h, SCServo.h) — only linked when CONFIG_STACKCHAN_SERVO_SCSCL=y is selected (e.g. via sdkconfig.defaults.local) |
GNU GPL-3.0 (see firmware/main/boards/stackchan/SCServo_lib_LICENSE.txt) |
The gateway/ runs as an independent Python process and only talks to the ESP32 over the network (WebSocket), so it stays usable and derivable under the MIT License regardless of which servo driver the firmware-side build selects.
The gateway's win_amd64 PyPI wheel additionally bundles opus.dll built from upstream Opus source by the publish workflow. That native binary ships under the BSD 3-clause license + Xiph extension; the notice is shipped in every distribution form as gateway/LICENSE-THIRD-PARTY. Non-Windows wheels and the sdist do not contain the binary — they rely on the system libopus. See gateway/stackchan_mcp/_libs/SOURCES.md for the per-release SHA256 and build provenance.
Note for direct
idf.pyusers with a pre-existingfirmware/sdkconfig: ESP-IDF persists Kconfig choices intofirmware/sdkconfig, and a change to the Kconfigdefaultdoes not retroactively rewrite that file. The canonical build path enforces the MIT driver at therelease.pylayer (viasdkconfig_append), so rebuilding viarelease.py stackchanreliably produces the MIT default binary. If you bypassrelease.pyand callidf.pydirectly on a workspace that previously selected SCSCL, you may need to runidf.py menuconfig(or deletefirmware/sdkconfig) to pick up the new default.
GPL-3.0 fallback build (opt-in): The original SCServo_lib sources are still shipped in the repository as a safety net while the migration tracked in #79 is in its observation period. To build against them, add
CONFIG_STACKCHAN_SERVO_SCSCL=ytofirmware/sdkconfig.defaults.local;release.pymerges this in aftersdkconfig_append, so it overrides the FEETECH default. In that configuration the firmware binary statically links the GPL-3.0 sources and is therefore effectively distributed under GPL-3.0. Once the observation period closes without regressions the GPL files are scheduled for removal (Phase B of #79).
firmware/ is taken in via git subtree from 78/xiaozhi-esp32 (MIT) — specifically the kisaragi-mochi/xiaozhi-esp32 fork. See docs/firmware-sync.md for the upstream sync playbook. The SCServo_lib sources under firmware/main/boards/stackchan/ (SCS.{cc,h}, SCSCL.{cc,h}, SCSerial.{cc,h}, INST.h, SCServo.h) originate from Feetech's SCServo SDK and entered this repository through the same kisaragi-mochi/xiaozhi-esp32 fork's main/boards/stackchan/ directory at the firmware subtree merge. They remain GPL-3.0 (see firmware/main/boards/stackchan/SCServo_lib_LICENSE.txt).
- M5Stack official StackChan documentation — official documentation for the target hardware (factory firmware / wiring / API reference / etc.)
- xiaozhi-esp32 — the base ESP32 LLM client firmware
- stack-chan — the original StackChan project (Shinya Ishikawa)
- stackchan-arduino — Arduino-side servo control library (Takao Akaki / mongonta0716); this firmware references the SCS0009 positioning timing established there
- m5stack-avatar — Avatar rendering library widely used by StackChan firmware
- Model Context Protocol — the MCP protocol specification
Issues and PRs are welcome. We aim to provide something the StackChan community can use as-is.
See CONTRIBUTING.md for the development flow.
"StackChan" and "スタックチャン" are registered trademarks of Shinya Ishikawa, the originator of the stack-chan project. This repository uses these names in reference to the M5Stack official StackChan kit it targets.