Vidi: Video Assistant Agent for Blind and Low Vision Users

This project is a FastAPI application paired with a front-end interface that allows users to interact with video frames. It provides the ability to:

Extract keyframes from a video when paused.
Automatically generate a description of the frame.
Ask questions about the frame and receive AI-generated answers.
Read the generated descriptions and answers aloud using text-to-speech.
Stop text-to-speech at any time with the Esc key.

Features

Keyframe Extraction:
- When the video is paused, the current frame is captured and analyzed.
- A textual description of the frame is generated using OpenAI's GPT-4o Mini model.
Question Asking:
- Users can ask a question about the paused frame by pressing the Q key.
- The application sends the frame and the question to OpenAI and displays the response.
Text-to-Speech:
- Descriptions and answers are read aloud to users.
- Speech can be stopped at any time by pressing the Esc key.
Keyboard-Driven Interaction:
- Q key: Ask a question about the paused frame.
- Esc key: Stop any ongoing text-to-speech.

Setup Instructions

1. Clone the Repository

git clone https://github.com/your-username/your-repo.git
cd your-repo


2. Set Up a Virtual Environment
bash
Copy
Edit
python -m venv venv
```markdown
3. Install Dependencies

```bash
pip install -r requirements.txt

Create a .env File

In the project root directory, create a file named .env and add your OpenAI API key:

OPENAI_API_KEY=your_openai_api_key

Run the Application

Start the FastAPI server:

uvicorn app:app --reload

The application will be accessible at http://127.0.0.1:8000.

How to Use

Open the App:

Navigate to http://127.0.0.1:8000 in your browser.

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Vidi: Video Assistant Agent for Blind and Low Vision Users

Features

Setup Instructions

1. Clone the Repository

How to Use

Open the App:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

shljessie/VideoChat

Folders and files

Latest commit

History

Repository files navigation

Vidi: Video Assistant Agent for Blind and Low Vision Users

Features

Setup Instructions

1. Clone the Repository

How to Use

Open the App:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages