Early detection of occupational stress: Enhancing workplace safety with machine learning and large language models
π A Machine Learning, Deep Learning, and NLP-based Approach
π Published in PLOS ONE (2025):
Hasan, M. J., Sultana, J., Ahmed, S., & Momen, S.
Early detection of occupational stress: Enhancing workplace safety with machine learning and large language models
PLOS ONE, 20(6), e0323265 (2025)
Workplace stress is a critical issue that impacts employee well-being, organizational efficiency, and workplace safety. This repository provides cutting-edge AI solutions for detecting, analyzing, and mitigating occupational stress using a combination of:
β
Machine Learning (ML) for stress detection models.
β
Deep Learning (DL) using advanced neural networks.
β
Natural Language Processing (NLP) to extract insights from survey data.
β
Explainable AI (XAI) for interpretability in workplace safety decisions.
β
Synthetic Data Generation to validate model generalizability.
Why does this matter?
- Stressed workers are more prone to workplace accidents and lower productivity.
- Organizations spend billions annually on absenteeism and medical costs due to work-related stress.
- Traditional stress assessments (e.g., surveys, self-reports) are slow, subjective, and lack predictive power.
Our AI-driven approach automates stress detection, enhances workplace safety, and enables proactive decision-making for both organizations and policymakers.
π¦ Occupational-Stress-Safety
βββ Main_results_holdout_val.ipynb # Main results reported in paper using holdout validation
βββ explainable_ai/ # XAI-based analysis
β βββ XAI_Occupational_Stress.ipynb
β
βββ SOTA_analysis/ # State-of-the-art comparison
β βββ SOTA_Paper_1.ipynb
β βββ SOTA_Paper_2.ipynb
β βββ ...
β
βββ domain_analysis/ # LLM-driven analysis
β βββ LLM_anova_+_rfecv.ipynb
β
βββ cross-validation/ # Model validation
β βββ cross-validation.ipynb
β
βββ anova/ # ANOVA-based feature selection
β βββ anova.ipynb
β
βββ rfecv/ # Recursive feature elimination
β βββ RFECV.ipynb
β
βββ synthetic_data_generation/ # Synthetic dataset generation
β βββ Synthetic_data_generation.ipynb
β βββ Synthetic_data_comparison.ipynb
β
βββ eda/ # Exploratory Data Analysis
β βββ EDA_Occupational_Stress.ipynb
β
βββ ablations/ # Impact of feature selection
β βββ Ablation_no_RFECV.ipynb
β βββ Ablation_no_anova.ipynb
β βββ Ablation_no_zero_var.ipynb
β
βββ dataset/ # Raw dataset
β βββ DIB dataset and codebook.xlsx
β
βββ synthetic_dataset/ # Generated datasets
β βββ synthetic_data_tvae.csv
β βββ synthetic_data_gaussian_copula.csv
β βββ synthetic_data_ctgan.csv
β βββ synthetic_data_copulaGan.csv
β
βββ LICENSE # MIT License
βββ README.md # This file
| Technology | Usage |
|---|---|
| Python | Core programming language |
| Jupyter | Interactive computing environment |
| Pandas | Data manipulation |
| NumPy | Numerical computing |
| Scikit-learn | Machine learning models & evaluation |
| PyTorch | Deep learning framework |
| Hugging Face Transformers | NLP and Large Language Models |
| Seaborn & Matplotlib | Data visualization |
| SHAP & LIME | Explainable AI (XAI) |
-
Clone the Repository
git clone https://github.com/junayed-hasan/occupational-stress-ml.git cd Occupational-Stress-Safety -
Create a Virtual Environment
python -m venv venv source venv/bin/activate # Mac/Linux .\venv\Scripts\activate # Windows
-
Install Dependencies
pip install -r requirements.txt
-
Run Jupyter Notebooks
jupyter notebook
- Run
eda/EDA_Occupational_Stress.ipynbto explore dataset characteristics.
anova/anova.ipynbfor ANOVA-based feature selection.rfecv/RFECV.ipynbfor recursive feature elimination.cross-validation/cross-validation.ipynbto validate models.
- Run
synthetic_data_generation/Synthetic_data_generation.ipynbto generate synthetic datasets.
explainable_ai/XAI_Occupational_Stress.ipynbprovides model interpretability.domain_analysis/LLM_anova_+_rfecv.ipynbutilizes NLP models for deeper insights.
- π Achieved 90.32% Accuracy in occupational stress classification.
- π Top predictors of stress: Workload, Manager Support, Job Role Clarity.
- π LLMs outperformed traditional NLP models in stress classification tasks.
- π Synthetic data generation proved effective, achieving ~89% accuracy on unseen test scenarios.
For full details, check:
Main_results_holdout_val.ipynb(final results)ablations/(impact of feature selection methods)
This research has direct implications for:
β HR & Workforce Management β Predict stress & reduce workplace accidents.
β Occupational Health & Safety Teams β Implement AI-driven monitoring.
β Government & Policymakers β Set data-driven workplace safety regulations.
β AI & Data Science Practitioners β Develop real-world stress detection models.
If you use this repository in your research, please cite:
@article{hasan2025early,
title={Early detection of occupational stress: Enhancing workplace safety with machine learning and large language models},
author={Hasan, Mohammad Junayed and Sultana, Jannat and Ahmed, Silvia and Momen, Sifat},
journal={PLoS One},
volume={20},
number={6},
pages={e0323265},
year={2025},
publisher={Public Library of Science San Francisco, CA USA},
doi={https://doi.org/10.1371/journal.pone.0323265}
}This project is licensed under the MIT License, allowing free use, modification, and distribution.
Contributions are welcome! To contribute:
- Fork the repository.
- Create a new branch (
feature-new-feature). - Commit changes and open a pull request.
π Collaborators: Mohammad Junayed Hasan and Jannat Sultana
π Supervisors: Dr. Sifat Momen (sifat.momen@northsouth.edu, https://scholar.google.com/citations?user=sGVZEaAAAAAJ), Ms. Silvia Ahmed (silvia.ahmed@northsouth.edu, https://scholar.google.com/citations?user=T5jK--YAAAAJ&hl=en&oi=ao)
π Emails: junayedhasan100@gmail.com, jannatsultana187@gmail.com
π LinkedIn: https://www.linkedin.com/in/mjhasan21/, https://www.linkedin.com/in/jannat-sultana/
We appreciate your interest in our research and welcome discussions, collaborations, and feedback! π