A comprehensive ComfyUI integration for Microsoft's VibeVoice text-to-speech model, enabling high-quality single and multi-speaker voice synthesis directly within your ComfyUI workflows.
-
Updated
Oct 2, 2025 - Python
A comprehensive ComfyUI integration for Microsoft's VibeVoice text-to-speech model, enabling high-quality single and multi-speaker voice synthesis directly within your ComfyUI workflows.
ComfyUI custom node for the VibeVoice TTS. Expressive, long-form, multi-speaker conversational audio
Beautiful voice app: record or upload to train a voice, generate speech from text or files, save & download voices.
Archive of the official Microsoft VibeVoice repository (7B & 1.5B). Backup of the deleted source code for the open-source TTS models, including the removed 7B version. Try the VibeVoice online service
A Gradio-based demo for end-to-end vision-to-speech inference: Extract text or descriptions from images using Qwen2.5-VL-7B-Instruct, then convert to natural speech audio via Microsoft VibeVoice-Realtime-0.5B.
Create multi-voice podcasts with AI text-to-speech
A FastRTC-compatible wrapper for Microsoft's [VibeVoice-Realtime-0.5B](https://huggingface.co/microsoft/VibeVoice-Realtime-0.5B) text-to-speech model, enabling real-time voice streaming in fastRTC
Add a description, image, and links to the vibevoice-microsoft topic page so that developers can more easily learn about it.
To associate your repository with the vibevoice-microsoft topic, visit your repo's landing page and select "manage topics."