Figuring out how to make your AI safer? How to avoid ethical biases, errors, privacy leaks or robustness issues in your AI models?
This repository contains a curated list of papers & technical articles on AI Quality & Safety that should help 📚
You can browse papers by Machine Learning task category, and use hashtags like #robustness to explore AI risk types.
- General ML Testing
- Tabular Machine Learning
- Natural Language Processing
- Computer Vision
- Recommendation System
- Time Series
- Machine learning testing: Survey, landscapes and horizons (Zhang et al., 2020)
#General - Quality Assurance for AI-based Systems: Overview and Challenges (Felderer et al., 2021)
#General - The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction (Breck et al., 2017)
#General - Reliable Machine Learning: Applying SRE Principles to ML in Production [BOOK] (Chen et al., 2022)
#Reliability - Metamorphic testing of decision support systems: A case study (Kuo et al., 2010)
#Robustness - A Survey on Metamorphic Testing (Segura et al., 2016)
#Robustness - Testing and validating machine learning classifiers by metamorphic testing (Xie et al., 2011)
#Robustness - The Disagreement Problem in Explainable Machine Learning: A Practitioner’s Perspective (Krishna et al., 2022)
#Explainability - InterpretML: A Unified Framework for Machine Learning Interpretability (Nori et al., 2019)
#Explainability#General - Fair regression: Quantitative definitions and reduction-based algorithms (Agarwal et al., 2019)
#Fairness - Learning Optimal and Fair Decision Trees for Non-Discriminative Decision-Making (Aghaei et al., 2019)
#Fairness - Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning (Henderson et al., 2020)
#Environment
- AI Incident Database (Responsible AI Collaborative)
- AI Vulnerability Database (AVID)
- Machine Learning Model Drift Detection Via Weak Data Slices (Ackerman et al., 2021)
#DataSlice#Debugging#Drift - Automated Data Slicing for Model Validation: A Big Data - AI Integration Approach (Chung et al., 2020)
#DataSlice - Interacting with Predictions: Visual Inspection of Black-box Machine Learning Models (Krause et al., 2016)
#Explainability
- Beyond Accuracy: Behavioral Testing of NLP Models with CheckList (Ribeiro et al., 2020)
#Robustness - Towards Robust Personalized Dialogue Generation via Order-Insensitive Representation Regularization (Chen et al. 2023)
#Robustness - Pipelines for Social Bias Testing of Large Language Models (Nozza et al., 2022)
#Bias#Ethics - Why Should I Trust You?": Explaining the Predictions of Any Classifier (Ribeiro et al., 2016)
#Explainability - A Unified Approach to Interpreting Model Predictions (Lundberg et al., 2017)
#Explainability - Anchors: High-Precision Model-Agnostic Explanations (Ribeiro et al., 2018)
#Explainability - Explanation-Based Human Debugging of NLP Models: A Survey (Lertvittayakumjorn, et al., 2021)
#Debugging - SEAL: Interactive Tool for Systematic Error Analysis and Labeling (Rajani et al., 2022)
#DataSlice#Explainability - Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science (Bender and Friedman, 2018)
#Bias - Equalizing Gender Biases in Neural Machine Translation with Word Embeddings Techniques (Font and Costa-jussà , 2019)
#Bias - On Measuring Social Biases in Sentence Encoders (May et al., 2019)
#Bias - BBQ: A Hand-Built Bias Benchmark for Question Answering (Parrish et al., 2022)
#Bias - What Do You See in this Patient? Behavioral Testing of Clinical NLP Models (Van Aken et al., 2021)
#Bias
- Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators (Chen et al., 2023)
#reliability - Holistic Evaluation of Language Models (Liang et al., 2022)
#General - Learning to summarize from human feedback (Stiennon et al., 2020)
#HumanFeedback - Identifying and Reducing Gender Bias in Word-Level Language Models (Bordia and Bowman, 2019)
#Bias
- DOMINO: Discovering Systematic Errors with Cross-modal Embeddings Domino (Eyuboglu et al., 2022)
#DataSlice - Explaining in Style: Training a GAN to explain a classifier in StyleSpace (Lang et al., 2022)
#Robustness - Model Assertions for Debugging Machine Learning (Kang et al., 2018)
#Debugging - Uncovering and Mitigating Algorithmic Bias through Learned Latent Structure (Amini et al.)
#Bias - Diversity in Faces (Merler et al.)
#Fairness#Accuracy
- Beyond NDCG: behavioral testing of recommender systems with RecList (Chia et al., 2021)
#Robustness