Conversational Question-Answering System with PDF Support

This is a Conversational QA System built with Streamlit and a Retrieval-Augmented Generation (RAG) approach using FAISS for storing embeddings from uploaded PDFs. The system can answer questions in two modes:

Normal QA: Answering general knowledge questions using a pre-trained model.
PDF-based QA: Answering questions based on content extracted and indexed from PDFs.

The system utilizes sentence embeddings and FAISS for efficient retrieval and generation of answers.

Technologies Used

Streamlit: A framework to build the web interface.
Sentence-Transformers: For embedding sentences into vectors using models like paraphrase-MiniLM-L6-v2.
FAISS: A library for efficient similarity search and clustering of dense vectors.
PyPDF2: To extract text from PDF documents.
Hugging Face Transformers: For text generation models used in both normal and PDF-based QA.

Project Structure

your_project/ │ ├── app.py # Main Streamlit app to interact with users. ├── helpers.py # Helper functions for PDF extraction, text processing, and FAISS indexing. ├── model.py # Functions for generating answers using a pre-trained model. ├── assets/ # Folder for storing uploaded PDFs. └── requirements.txt # Python dependencies for the project.

Features

Normal QA:
- Allows users to ask general knowledge questions.
- The model responds with a detailed answer based on a pre-trained model (flan-T5-Base).
PDF-based QA:
- Users can upload a PDF document.
- The text is extracted, embedded, and indexed using FAISS.
- The system can then answer questions by searching the PDF's content and returning relevant information.
Sidebar:
- Displays the conversation history with questions and answers.
- Lists citations for answers sourced from the uploaded PDF, including the page number.
Session Information:
- Displays the session title based on the number of messages.

Installation and Setup

Clone the repository:

git clone https://github.com/yourusername/conversational-chatbot.git
cd conversational-chatbot

Set up a virtual environment (optional but recommended):: python -m venv venv source venv/bin/activate # On Windows, use venv\Scripts\activate
Install dependencies: Install the required Python libraries using pip.: pip install -r requirements.txt 4.Start the Streamlit app: Run the app using Streamlit. streamlit run app.py

Usage

Once the app is running, open the provided URL in your browser (usually http://localhost:8501).

Normal QA Mode: Type any general knowledge question into the text input box, and the model will generate a detailed answer.
PDF-based QA Mode:
    1. Upload a PDF document.
    2. Ask questions related to the content of the PDF, and the model will generate answers by retrieving relevant information from the indexed PDF.

The sidebar displays the conversation history, citations (with page numbers), and the current session's title.

How It Works

Normal QA Mode:

Users can input any general question. The question is passed to a pre-trained question-answering model (flan-T5-Base). The model generates a detailed response based on its training.
PDF-based QA Mode:

Users upload a PDF file. The PDF is processed using PyPDF2 to extract the text. The text is split into sentences, and embeddings are generated for each sentence using the Sentence-Transformer model. The embeddings are indexed using FAISS for efficient similarity search. When the user asks a question, the question is embedded into a vector, and the FAISS index is searched for similar sentences. The top k relevant sentences are retrieved and passed as context to the question-answering model (flan-T5-Base), which generates the final answer.

3. FAISS Index:

FAISS is used to efficiently search for the most relevant sentences in the document when the user asks a question.
Each sentence is assigned an embedding, and these embeddings are indexed using FAISS for quick retrieval.

4. Citations:

For the PDF-based QA mode, the sidebar displays the citations: sentences from the PDF that were used to generate the answer, along with their corresponding page number.

Project Flow

1. PDF Upload:
    User uploads a PDF file.
    Text is extracted from the PDF and split into sentences.

2. Embeddings:
    Sentences are passed through a Sentence-Transformer model to create embeddings.
    Embeddings are indexed using FAISS for efficient similarity search.

3. Search and Answer Generation:
    When a user asks a question, the question is embedded and used to search the FAISS index for the most relevant sentences.
    The relevant sentences are provided as context to the Flan-T5-Base model, which generates a detailed answer.

4. Sidebar Updates:
    Displays the conversation history and citations (page numbers and relevant sentences) in the sidebar.

Conclusions

This project demonstrates how to build a Conversational QA System that can operate in two modes: answering general questions and answering based on content from uploaded PDFs. The use of FAISS for fast, scalable similarity search and Sentence-Transformers for embedding sentences allows the system to efficiently process and generate detailed answers.

This README file is structured to explain the project clearly, both in terms of functionality and implementation, which should help you in your interview presentation. You can copy and paste this into your README.md file for your project.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
embeddings		embeddings
src		src
.gitignore		.gitignore
Documentation.docx		Documentation.docx
LICENSE		LICENSE
README.md		README.md
app.py		app.py
faiss_index.index		faiss_index.index
faiss_index.pkl		faiss_index.pkl
helpers.py		helpers.py
model.py		model.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Conversational Question-Answering System with PDF Support

Table of Contents

Technologies Used

Project Structure

Features

Installation and Setup

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

License

sahana-github/Conversational-Chatbot

Folders and files

Latest commit

History

Repository files navigation

Conversational Question-Answering System with PDF Support

Table of Contents

Technologies Used

Project Structure

Features

Installation and Setup

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages