Skip to content

This repository contains code, datasets, and analysis for AI-driven occupational stress detection using machine learning, deep learning, and NLP. It includes feature selection, explainable AI, synthetic data generation, and model validation for workplace safety applications. πŸš€

License

Notifications You must be signed in to change notification settings

junayed-hasan/occupational-stress-ml

Repository files navigation

Early detection of occupational stress: Enhancing workplace safety with machine learning and large language models

πŸ“Œ A Machine Learning, Deep Learning, and NLP-based Approach

πŸ“„ Published in PLOS ONE (2025):
Hasan, M. J., Sultana, J., Ahmed, S., & Momen, S.
Early detection of occupational stress: Enhancing workplace safety with machine learning and large language models
PLOS ONE, 20(6), e0323265 (2025)

License: MIT
Python
Jupyter
Contributions


πŸ” Overview

Workplace stress is a critical issue that impacts employee well-being, organizational efficiency, and workplace safety. This repository provides cutting-edge AI solutions for detecting, analyzing, and mitigating occupational stress using a combination of:

βœ… Machine Learning (ML) for stress detection models.
βœ… Deep Learning (DL) using advanced neural networks.
βœ… Natural Language Processing (NLP) to extract insights from survey data.
βœ… Explainable AI (XAI) for interpretability in workplace safety decisions.
βœ… Synthetic Data Generation to validate model generalizability.

Why does this matter?

  • Stressed workers are more prone to workplace accidents and lower productivity.
  • Organizations spend billions annually on absenteeism and medical costs due to work-related stress.
  • Traditional stress assessments (e.g., surveys, self-reports) are slow, subjective, and lack predictive power.

Our AI-driven approach automates stress detection, enhances workplace safety, and enables proactive decision-making for both organizations and policymakers.


πŸ“‚ Repository Structure

πŸ“¦ Occupational-Stress-Safety
β”œβ”€β”€ Main_results_holdout_val.ipynb      # Main results reported in paper using holdout validation 
β”œβ”€β”€ explainable_ai/                     # XAI-based analysis  
β”‚   β”œβ”€β”€ XAI_Occupational_Stress.ipynb   
β”‚
β”œβ”€β”€ SOTA_analysis/                      # State-of-the-art comparison  
β”‚   β”œβ”€β”€ SOTA_Paper_1.ipynb  
β”‚   β”œβ”€β”€ SOTA_Paper_2.ipynb  
β”‚   └── ...  
β”‚
β”œβ”€β”€ domain_analysis/                    # LLM-driven analysis  
β”‚   β”œβ”€β”€ LLM_anova_+_rfecv.ipynb  
β”‚
β”œβ”€β”€ cross-validation/                    # Model validation  
β”‚   β”œβ”€β”€ cross-validation.ipynb  
β”‚
β”œβ”€β”€ anova/                               # ANOVA-based feature selection  
β”‚   β”œβ”€β”€ anova.ipynb  
β”‚
β”œβ”€β”€ rfecv/                               # Recursive feature elimination  
β”‚   β”œβ”€β”€ RFECV.ipynb  
β”‚
β”œβ”€β”€ synthetic_data_generation/           # Synthetic dataset generation  
β”‚   β”œβ”€β”€ Synthetic_data_generation.ipynb  
β”‚   β”œβ”€β”€ Synthetic_data_comparison.ipynb  
β”‚
β”œβ”€β”€ eda/                                 # Exploratory Data Analysis  
β”‚   β”œβ”€β”€ EDA_Occupational_Stress.ipynb  
β”‚
β”œβ”€β”€ ablations/                           # Impact of feature selection  
β”‚   β”œβ”€β”€ Ablation_no_RFECV.ipynb  
β”‚   β”œβ”€β”€ Ablation_no_anova.ipynb  
β”‚   β”œβ”€β”€ Ablation_no_zero_var.ipynb  
β”‚
β”œβ”€β”€ dataset/                             # Raw dataset  
β”‚   β”œβ”€β”€ DIB dataset and codebook.xlsx  
β”‚
β”œβ”€β”€ synthetic_dataset/                   # Generated datasets  
β”‚   β”œβ”€β”€ synthetic_data_tvae.csv  
β”‚   β”œβ”€β”€ synthetic_data_gaussian_copula.csv  
β”‚   β”œβ”€β”€ synthetic_data_ctgan.csv  
β”‚   β”œβ”€β”€ synthetic_data_copulaGan.csv  
β”‚
β”œβ”€β”€ LICENSE                              # MIT License  
└── README.md                            # This file  

πŸ’» Tech Stack

Technology Usage
Python Core programming language
Jupyter Interactive computing environment
Pandas Data manipulation
NumPy Numerical computing
Scikit-learn Machine learning models & evaluation
PyTorch Deep learning framework
Hugging Face Transformers NLP and Large Language Models
Seaborn & Matplotlib Data visualization
SHAP & LIME Explainable AI (XAI)

πŸš€ Getting Started

  1. Clone the Repository

    git clone https://github.com/junayed-hasan/occupational-stress-ml.git
    cd Occupational-Stress-Safety
  2. Create a Virtual Environment

    python -m venv venv
    source venv/bin/activate  # Mac/Linux
    .\venv\Scripts\activate   # Windows
  3. Install Dependencies

    pip install -r requirements.txt
  4. Run Jupyter Notebooks

    jupyter notebook

πŸ”ͺ Usage

1️⃣ Data Preprocessing & EDA

  • Run eda/EDA_Occupational_Stress.ipynb to explore dataset characteristics.

2️⃣ Feature Selection & Model Training

  • anova/anova.ipynb for ANOVA-based feature selection.
  • rfecv/RFECV.ipynb for recursive feature elimination.
  • cross-validation/cross-validation.ipynb to validate models.

3️⃣ Synthetic Data Generation

  • Run synthetic_data_generation/Synthetic_data_generation.ipynb to generate synthetic datasets.

4️⃣ Explainability & Domain Analysis

  • explainable_ai/XAI_Occupational_Stress.ipynb provides model interpretability.
  • domain_analysis/LLM_anova_+_rfecv.ipynb utilizes NLP models for deeper insights.

πŸ“Š Results & Key Findings

  • πŸ“Œ Achieved 90.32% Accuracy in occupational stress classification.
  • πŸ“Œ Top predictors of stress: Workload, Manager Support, Job Role Clarity.
  • πŸ“Œ LLMs outperformed traditional NLP models in stress classification tasks.
  • πŸ“Œ Synthetic data generation proved effective, achieving ~89% accuracy on unseen test scenarios.

For full details, check:

  • Main_results_holdout_val.ipynb (final results)
  • ablations/ (impact of feature selection methods)

πŸ“Œ Industry Relevance

This research has direct implications for:
βœ” HR & Workforce Management – Predict stress & reduce workplace accidents.
βœ” Occupational Health & Safety Teams – Implement AI-driven monitoring.
βœ” Government & Policymakers – Set data-driven workplace safety regulations.
βœ” AI & Data Science Practitioners – Develop real-world stress detection models.


πŸ“š Citation

If you use this repository in your research, please cite:

@article{hasan2025early,
  title={Early detection of occupational stress: Enhancing workplace safety with machine learning and large language models},
  author={Hasan, Mohammad Junayed and Sultana, Jannat and Ahmed, Silvia and Momen, Sifat},
  journal={PLoS One},
  volume={20},
  number={6},
  pages={e0323265},
  year={2025},
  publisher={Public Library of Science San Francisco, CA USA},
  doi={https://doi.org/10.1371/journal.pone.0323265}
}

πŸ“ License

This project is licensed under the MIT License, allowing free use, modification, and distribution.


🀝 Contributing

Contributions are welcome! To contribute:

  1. Fork the repository.
  2. Create a new branch (feature-new-feature).
  3. Commit changes and open a pull request.

πŸ“¬ Contact

πŸ“Œ Collaborators: Mohammad Junayed Hasan and Jannat Sultana

πŸ“Œ Supervisors: Dr. Sifat Momen (sifat.momen@northsouth.edu, https://scholar.google.com/citations?user=sGVZEaAAAAAJ), Ms. Silvia Ahmed (silvia.ahmed@northsouth.edu, https://scholar.google.com/citations?user=T5jK--YAAAAJ&hl=en&oi=ao)

πŸ“Œ Emails: junayedhasan100@gmail.com, jannatsultana187@gmail.com

πŸ“Œ LinkedIn: https://www.linkedin.com/in/mjhasan21/, https://www.linkedin.com/in/jannat-sultana/

We appreciate your interest in our research and welcome discussions, collaborations, and feedback! πŸš€

About

This repository contains code, datasets, and analysis for AI-driven occupational stress detection using machine learning, deep learning, and NLP. It includes feature selection, explainable AI, synthetic data generation, and model validation for workplace safety applications. πŸš€

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published