Skip to content

rohanprichard/fastrtc-demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FastRTC POC

A simple POC for a fast real-time voice chat application using FastAPI and FastRTC by rohanprichard. I wanted to make one as an example with more production-ready languages, rather than just Gradio.

Setup

  1. Set your OpenAI and ElevenLabs API key in an .env file based on the .env.example file

  2. Create a virtual environment and install the dependencies

    python3 -m venv env
    source env/bin/activate
    pip install -r requirements.txt

    For windows,

    python -m venv env
    .\env\Scripts\activate
    pip install -r requirements.txt
  3. Run the server

    ./run.sh

    Windows:

    uvicorn backend.server:app --host 0.0.0.0 --port 8000
  4. Navigate into the frontend directory

    cd frontend/fastrtc-demo
  5. Run the frontend

    npm install
    npm run dev
  6. Click the microphone icon and start chatting!

  7. Reset chats by clicking the trash button on the bottom right

Notes

  • The STT is currently using the ElevenLabs API.
  • The LLM is currently using the OpenAI API.
  • The TTS is currently using the ElevenLabs API.
  • The VAD is currently using the Silero VAD model.
  • You may need to install ffmpeg if you get errors in STT

The prompt can be changed in the backend/server.py file and modified as you like.

Audio Parameters

AlgoOptions

  • audio_chunk_duration: Length of audio chunks in seconds. Smaller values allow for faster processing but may be less accurate.
  • started_talking_threshold: If a chunk has more than this many seconds of speech, the system considers that the user has started talking.
  • speech_threshold: After the user has started speaking, if a chunk has less than this many seconds of speech, the system considers that the user has paused.

SileroVadOptions

  • threshold: Speech probability threshold (0.0-1.0). Values above this are considered speech. Higher values are more strict.
  • min_speech_duration_ms: Speech segments shorter than this (in milliseconds) are filtered out.
  • min_silence_duration_ms: The system waits for this duration of silence (in milliseconds) before considering speech to be finished.
  • speech_pad_ms: Padding added to both ends of detected speech segments to prevent cutting off words.
  • max_speech_duration_s: Maximum allowed duration for a speech segment in seconds. Prevents indefinite listening.

Tuning Recommendations

  • If the AI interrupts you too early:

    • Increase min_silence_duration_ms
    • Increase speech_threshold
    • Increase speech_pad_ms
  • If the AI is slow to respond after you finish speaking:

    • Decrease min_silence_duration_ms
    • Decrease speech_threshold
  • If the system fails to detect some speech:

    • Lower the threshold value
    • Decrease started_talking_threshold

Credits:

Credit for the UI components goes to Shadcn, Aceternity UI and Kokonut UI.

About

A simple POC of FastRTC, a framework to use voice mode in python!

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published