This project is a FastAPI application paired with a front-end interface that allows users to interact with video frames. It provides the ability to:
- Extract keyframes from a video when paused.
- Automatically generate a description of the frame.
- Ask questions about the frame and receive AI-generated answers.
- Read the generated descriptions and answers aloud using text-to-speech.
- Stop text-to-speech at any time with the
Esckey.
-
Keyframe Extraction:
- When the video is paused, the current frame is captured and analyzed.
- A textual description of the frame is generated using OpenAI's GPT-4o Mini model.
-
Question Asking:
- Users can ask a question about the paused frame by pressing the
Qkey. - The application sends the frame and the question to OpenAI and displays the response.
- Users can ask a question about the paused frame by pressing the
-
Text-to-Speech:
- Descriptions and answers are read aloud to users.
- Speech can be stopped at any time by pressing the
Esckey.
-
Keyboard-Driven Interaction:
Qkey: Ask a question about the paused frame.Esckey: Stop any ongoing text-to-speech.
git clone https://github.com/your-username/your-repo.git
cd your-repo
2. Set Up a Virtual Environment
bash
Copy
Edit
python -m venv venv
```markdown
3. Install Dependencies
```bash
pip install -r requirements.txt- Create a .env File
In the project root directory, create a file named .env and add your OpenAI API key:
OPENAI_API_KEY=your_openai_api_key- Run the Application
Start the FastAPI server:
uvicorn app:app --reloadThe application will be accessible at http://127.0.0.1:8000.
Navigate to http://127.0.0.1:8000 in your browser.