FastRTC POC

A simple POC for a fast real-time voice chat application using FastAPI and FastRTC by rohanprichard. I wanted to make one as an example with more production-ready languages, rather than just Gradio.

Setup

Set your OpenAI and ElevenLabs API key in an .env file based on the .env.example file

Create a virtual environment and install the dependencies

python3 -m venv env
source env/bin/activate
pip install -r requirements.txt

For windows,

python -m venv env
.\env\Scripts\activate
pip install -r requirements.txt

Run the server

./run.sh

Windows:

uvicorn backend.server:app --host 0.0.0.0 --port 8000

Navigate into the frontend directory
```
cd frontend/fastrtc-demo
```
Run the frontend
```
npm install
npm run dev
```
Click the microphone icon and start chatting!
Reset chats by clicking the trash button on the bottom right

Notes

The STT is currently using the ElevenLabs API.
The LLM is currently using the OpenAI API.
The TTS is currently using the ElevenLabs API.
The VAD is currently using the Silero VAD model.
You may need to install ffmpeg if you get errors in STT

The prompt can be changed in the backend/server.py file and modified as you like.

Audio Parameters

AlgoOptions

audio_chunk_duration: Length of audio chunks in seconds. Smaller values allow for faster processing but may be less accurate.
started_talking_threshold: If a chunk has more than this many seconds of speech, the system considers that the user has started talking.
speech_threshold: After the user has started speaking, if a chunk has less than this many seconds of speech, the system considers that the user has paused.

SileroVadOptions

threshold: Speech probability threshold (0.0-1.0). Values above this are considered speech. Higher values are more strict.
min_speech_duration_ms: Speech segments shorter than this (in milliseconds) are filtered out.
min_silence_duration_ms: The system waits for this duration of silence (in milliseconds) before considering speech to be finished.
speech_pad_ms: Padding added to both ends of detected speech segments to prevent cutting off words.
max_speech_duration_s: Maximum allowed duration for a speech segment in seconds. Prevents indefinite listening.

Tuning Recommendations

If the AI interrupts you too early:
- Increase min_silence_duration_ms
- Increase speech_threshold
- Increase speech_pad_ms
If the AI is slow to respond after you finish speaking:
- Decrease min_silence_duration_ms
- Decrease speech_threshold
If the system fails to detect some speech:
- Lower the threshold value
- Decrease started_talking_threshold

Credits:

Credit for the UI components goes to Shadcn, Aceternity UI and Kokonut UI.

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
backend		backend
frontend/fastrtc-demo		frontend/fastrtc-demo
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
requirements.txt		requirements.txt
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FastRTC POC

Setup

Notes

Audio Parameters

AlgoOptions

SileroVadOptions

Tuning Recommendations

Credits:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

rohanprichard/fastrtc-demo

Folders and files

Latest commit

History

Repository files navigation

FastRTC POC

Setup

Notes

Audio Parameters

AlgoOptions

SileroVadOptions

Tuning Recommendations

Credits:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages