Shravan is an AI-powered tool designed to process YouTube videos by converting speech to text, translating transcripts, summarizing content, and enabling contextual Q&A. This project leverages on Speech Recognition, Advance NLP, Hugging Face, and Seq2Seq LLMs
- π€ Video-to-Audio Conversion: Extracts audio from locally downloaded YouTube videos.
- π Speech-to-Text Transcription: Converts audio chunks into text using PyDub and Wave2Vec.
- π Translation Support: Translates transcripts into a user-specified language.
- π Summarization: Provides concise summaries of the original transcript.
- π€ Contextual Q&A: Uses FAISS + FLAN-T5 to allow users to ask questions about the video content.
- β‘ Fast and Efficient: Optimized pipeline for processing large video files quickly.
git clone https://github.com/sagarvk24/EchoTranscribe-AI-YouTube-Video-Processing-with-Speech-to-Text-Q-A-and-Translation.git
cd EchoTranscribe
- π€ Hugging Face Transformers (For Q&A & Summarization)
- ποΈ PyDub, HuggingSound and Wave2Vec (ASR and Speech-To-Text)
- π¦ FAISS (Vector Database for efficient Q&A)
- π Google Translate API (Text Translation)
- π PyTorch (Model Inference)
Contributions are welcome! Feel free to fork the repo, create a branch, and submit a pull request.
git checkout -b feature-branch
This project is licensed under the MIT License. See LICENSE for details.
For any queries, reach out via LinkedIn or open an issue on GitHub.
π Happy Transcribing & Exploring!