Awesome AI Safety

Figuring out how to make your AI safer? How to avoid ethical biases, errors, privacy leaks or robustness issues in your AI models?

This repository contains a curated list of papers & technical articles on AI Quality & Safety that should help 📚

General ML Testing

Machine learning testing: Survey, landscapes and horizons (Zhang et al., 2020) #General
Quality Assurance for AI-based Systems: Overview and Challenges (Felderer et al., 2021) #General
The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction (Breck et al., 2017) #General
Reliable Machine Learning: Applying SRE Principles to ML in Production [BOOK] (Chen et al., 2022) #Reliability
Metamorphic testing of decision support systems: A case study (Kuo et al., 2010) #Robustness
A Survey on Metamorphic Testing (Segura et al., 2016) #Robustness
Testing and validating machine learning classifiers by metamorphic testing (Xie et al., 2011) #Robustness
The Disagreement Problem in Explainable Machine Learning: A Practitioner’s Perspective (Krishna et al., 2022) #Explainability
InterpretML: A Unified Framework for Machine Learning Interpretability (Nori et al., 2019) #Explainability #General
Fair regression: Quantitative definitions and reduction-based algorithms (Agarwal et al., 2019) #Fairness
Learning Optimal and Fair Decision Trees for Non-Discriminative Decision-Making (Aghaei et al., 2019) #Fairness
Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning (Henderson et al., 2020) #Environment

AI Incident Databases

AI Incident Database (Responsible AI Collaborative)
AI Vulnerability Database (AVID)

Tabular Machine Learning

Machine Learning Model Drift Detection Via Weak Data Slices (Ackerman et al., 2021) #DataSlice #Debugging #Drift
Automated Data Slicing for Model Validation: A Big Data - AI Integration Approach (Chung et al., 2020) #DataSlice
Interacting with Predictions: Visual Inspection of Black-box Machine Learning Models (Krause et al., 2016) #Explainability

Natural Language Processing

Beyond Accuracy: Behavioral Testing of NLP Models with CheckList (Ribeiro et al., 2020) #Robustness
Towards Robust Personalized Dialogue Generation via Order-Insensitive Representation Regularization (Chen et al. 2023)#Robustness
Pipelines for Social Bias Testing of Large Language Models (Nozza et al., 2022) #Bias #Ethics
Why Should I Trust You?": Explaining the Predictions of Any Classifier (Ribeiro et al., 2016) #Explainability
A Unified Approach to Interpreting Model Predictions (Lundberg et al., 2017) #Explainability
Anchors: High-Precision Model-Agnostic Explanations (Ribeiro et al., 2018) #Explainability
Explanation-Based Human Debugging of NLP Models: A Survey (Lertvittayakumjorn, et al., 2021) #Debugging
SEAL: Interactive Tool for Systematic Error Analysis and Labeling (Rajani et al., 2022) #DataSlice #Explainability
Data Statements for Natural Language Processing: Toward Mitigating System Bias and Enabling Better Science (Bender and Friedman, 2018) #Bias
Equalizing Gender Biases in Neural Machine Translation with Word Embeddings Techniques (Font and Costa-jussà, 2019) #Bias
On Measuring Social Biases in Sentence Encoders (May et al., 2019) #Bias
BBQ: A Hand-Built Bias Benchmark for Question Answering (Parrish et al., 2022) #Bias
What Do You See in this Patient? Behavioral Testing of Clinical NLP Models (Van Aken et al., 2021) #Bias

Large Language Models

Beyond Factuality: A Comprehensive Evaluation of Large Language Models as Knowledge Generators (Chen et al., 2023) #reliability
Holistic Evaluation of Language Models (Liang et al., 2022) #General
Learning to summarize from human feedback (Stiennon et al., 2020) #HumanFeedback
Identifying and Reducing Gender Bias in Word-Level Language Models (Bordia and Bowman, 2019) #Bias

Computer Vision

DOMINO: Discovering Systematic Errors with Cross-modal Embeddings Domino (Eyuboglu et al., 2022) #DataSlice
Explaining in Style: Training a GAN to explain a classifier in StyleSpace (Lang et al., 2022) #Robustness
Model Assertions for Debugging Machine Learning (Kang et al., 2018) #Debugging
Uncovering and Mitigating Algorithmic Bias through Learned Latent Structure (Amini et al.) #Bias
Diversity in Faces (Merler et al.) #Fairness #Accuracy

Recommendation System

Beyond NDCG: behavioral testing of recommender systems with RecList (Chia et al., 2021) #Robustness

Time Series

Contributions are welcome 💕

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Awesome AI Safety

Table of Contents

General ML Testing

AI Incident Databases

Tabular Machine Learning

Natural Language Processing

Large Language Models

Computer Vision

Recommendation System

Time Series

About

Uh oh!

Releases

Uh oh!

Contributors 6

Uh oh!

License

Giskard-AI/awesome-ai-safety

Folders and files

Latest commit

History

Repository files navigation

Awesome AI Safety

Table of Contents

General ML Testing

AI Incident Databases

Tabular Machine Learning

Natural Language Processing

Large Language Models

Computer Vision

Recommendation System

Time Series

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Contributors 6

Uh oh!