AI-powered NLP model to classify patient reviews and diagnoses into medical conditions
Patients often describe their symptoms or medication experiences in text form. This project uses Natural Language Processing (NLP) and Machine Learning to classify these reviews/diagnoses into medical conditions (e.g., Diabetes, Hypertension, Depression).
β
Text Preprocessing: Tokenization, Lemmatization, Stopword Removal
β
Feature Engineering: TF-IDF, Word Embeddings
β
ML Models: Logistic Regression, Passive Aggressive Classifier, MultinomialNB for Text Classification
β
Evaluation Metrics: Accuracy, Confusion Matrix
β
Deployment: Streamlit Web App for real-time prediction
- Language: Python
- Libraries: Pandas, Scikit-Learn, SpaCy, NLTK
- Machine Learning Models: Logistic Regression, Random Forest, LSTM, BERT
- Deployment: Streamlit Web App
| Resource | Link |
|---|---|
| π GitHub Repository | GitHub Repo |
| π Google Colab Notebook | Colab Notebook |
| π Demo Web App | Streamlit App |
- Remove special characters, stopwords, and punctuation
- Apply lemmatization using
spaCy - Convert text into numerical vectors using TF-IDF
- Baseline Model: TF-IDF + Logistic Regression (benchmark model)
- Advanced Model: BERT-based Transformer for text classification
- Accuracy: 95%
- Confusion Matrix
- Streamlit Web App for real-time predictions
| Review Text | Predicted Condition |
|---|---|
| "I have severe headaches and dizziness daily." | Migraine |
| "My sugar levels are high even after taking insulin." | Diabetes |
| "Feeling sad and hopeless for months." | Depression |
| "Blood pressure spikes every morning." | Hypertension |
