This is a Conversational QA System built with Streamlit and a Retrieval-Augmented Generation (RAG) approach using FAISS for storing embeddings from uploaded PDFs. The system can answer questions in two modes:
- Normal QA: Answering general knowledge questions using a pre-trained model.
- PDF-based QA: Answering questions based on content extracted and indexed from PDFs.
The system utilizes sentence embeddings and FAISS for efficient retrieval and generation of answers.

- Technologies Used
- Project Structure
- Features
- Installation and Setup
- Usage
- How It Works
- Project Flow
- Conclusion
- Streamlit: A framework to build the web interface.
- Sentence-Transformers: For embedding sentences into vectors using models like
paraphrase-MiniLM-L6-v2. - FAISS: A library for efficient similarity search and clustering of dense vectors.
- PyPDF2: To extract text from PDF documents.
- Hugging Face Transformers: For text generation models used in both normal and PDF-based QA.
your_project/ │ ├── app.py # Main Streamlit app to interact with users. ├── helpers.py # Helper functions for PDF extraction, text processing, and FAISS indexing. ├── model.py # Functions for generating answers using a pre-trained model. ├── assets/ # Folder for storing uploaded PDFs. └── requirements.txt # Python dependencies for the project.
- Normal QA:
- Allows users to ask general knowledge questions.
- The model responds with a detailed answer based on a pre-trained model (
flan-T5-Base).
- PDF-based QA:
- Users can upload a PDF document.
- The text is extracted, embedded, and indexed using FAISS.
- The system can then answer questions by searching the PDF's content and returning relevant information.
- Sidebar:
- Displays the conversation history with questions and answers.
- Lists citations for answers sourced from the uploaded PDF, including the page number.
- Session Information:
- Displays the session title based on the number of messages.
- Clone the repository:
git clone https://github.com/yourusername/conversational-chatbot.git cd conversational-chatbot - Set up a virtual environment (optional but recommended)::
python -m venv venv
source venv/bin/activate # On Windows, use
venv\Scripts\activate - Install dependencies: Install the required Python libraries using pip.: pip install -r requirements.txt 4.Start the Streamlit app: Run the app using Streamlit. streamlit run app.py
Usage
Once the app is running, open the provided URL in your browser (usually http://localhost:8501).
Normal QA Mode: Type any general knowledge question into the text input box, and the model will generate a detailed answer.
PDF-based QA Mode:
1. Upload a PDF document.
2. Ask questions related to the content of the PDF, and the model will generate answers by retrieving relevant information from the indexed PDF.
The sidebar displays the conversation history, citations (with page numbers), and the current session's title.
How It Works
-
Normal QA Mode:
Users can input any general question. The question is passed to a pre-trained question-answering model (flan-T5-Base). The model generates a detailed response based on its training.
-
PDF-based QA Mode:
Users upload a PDF file. The PDF is processed using PyPDF2 to extract the text. The text is split into sentences, and embeddings are generated for each sentence using the Sentence-Transformer model. The embeddings are indexed using FAISS for efficient similarity search. When the user asks a question, the question is embedded into a vector, and the FAISS index is searched for similar sentences. The top k relevant sentences are retrieved and passed as context to the question-answering model (flan-T5-Base), which generates the final answer.
3. FAISS Index:
FAISS is used to efficiently search for the most relevant sentences in the document when the user asks a question.
Each sentence is assigned an embedding, and these embeddings are indexed using FAISS for quick retrieval.
4. Citations:
For the PDF-based QA mode, the sidebar displays the citations: sentences from the PDF that were used to generate the answer, along with their corresponding page number.
Project Flow
1. PDF Upload:
User uploads a PDF file.
Text is extracted from the PDF and split into sentences.
2. Embeddings:
Sentences are passed through a Sentence-Transformer model to create embeddings.
Embeddings are indexed using FAISS for efficient similarity search.
3. Search and Answer Generation:
When a user asks a question, the question is embedded and used to search the FAISS index for the most relevant sentences.
The relevant sentences are provided as context to the Flan-T5-Base model, which generates a detailed answer.
4. Sidebar Updates:
Displays the conversation history and citations (page numbers and relevant sentences) in the sidebar.
Conclusions
This project demonstrates how to build a Conversational QA System that can operate in two modes: answering general questions and answering based on content from uploaded PDFs. The use of FAISS for fast, scalable similarity search and Sentence-Transformers for embedding sentences allows the system to efficiently process and generate detailed answers.
This README file is structured to explain the project clearly, both in terms of functionality and implementation, which should help you in your interview presentation. You can copy and paste this into your README.md file for your project.
