From the course: Algorithmic Auditing and Continuous Monitoring

What is algorithmic auditing?

- Algorithmic auditing refers to a range of approaches used to understand and evaluate an algorithmic system, including AI-enabled systems. There are many different types of algorithmic audits. They can include an assessment of the governance frameworks and documentation, empirical tests of performance, and technical evaluation of data, code or methods. The strongest algorithmic audit would include all three, governance, empirical, and technical audits. Governance audits assess whether appropriate policies and practices are in place to support responsible design, development and deployment of an AI system. As part of a governance audit, a project manager may be required to conduct an impact or risk assessment in partnership with the development team to identify any potential risks or adverse impacts, including assessing whether the AI system is violating any laws and regulations. For example, a governance audit can evaluate whether the use and processing of personal data in the AI system violates data protection laws. An empirical audit is outcomes-focused and does not evaluate the internal workings of the system. This type of audit is primarily focused on assessing whether the system is performing as intended, not why it is or is not performing as intended. In 2016, ProPublica conducted an empirical audit of an algorithmic tool used by judges to determine an individual's likelihood to re-offend. A defendant's likelihood to re-offend would be used to inform their sentencing. The audit revealed that the model had a fatal flaw. Comparing the model's results with data on actual re-offense, it was twice as likely to inaccurately label Black defendants with a high probability to re-offend than white defendants, and white defendants were more often labeled as unlikely to re-offend. In any audit, human judgment is still needed. For example, a developer of an AI tool may decide it's important to mitigate racial bias. However, by focusing solely on racial discrimination, the model may inadvertently overlook gender bias or bias that emerges at the intersection of race and gender. It's important to be transparent about what features are being evaluated in an empirical audit and whether any features of relevance were not assessed and why. A technical audit typically has three components. The first, defining the purpose and scoping the audit, including what components will be tested, how often, and what constitutes failure. Second, evaluating data inputs, model, and output to ensure alignment with responsible AI principles such as non-discrimination and privacy. Third, documenting all processes and component one and two to allow for continuous monitoring. Let's explore an example. Imagine that you work at a company that helps businesses recruit technical freelancers. You're a senior developer and have been tasked with developing a new service that will help your customers identify freelance employment opportunities they're eligible for. As a first step, you collect all data on current freelance employment opportunities posted by employers on your site, including necessary experience and skills, time commitments, pay rates, and a list of freelancers the employers have been successfully paired with. You also collect data on customers who have engaged in freelance employment opportunities. You document the types of freelance jobs they've completed, their experience and skills, time commitments and pay rates. You consulted with your company's legal and policy teams and have been advised to perform a technical audit to ensure this service does not perform in a biased or discriminatory way. The technical audit you perform includes the following three steps. First, you define the purpose and scoping of the audit, including what components will be tested, how often and what constitutes failure. You define the purpose of your technical audit as identifying whether the model may be discriminatory based off of protected characteristics like gender, race, or age. You decide that it's important to do this twice throughout the year because the model will be learning in real time from successful and non-successful matches between employers and freelancers. You now move to the second step, to evaluate the data inputs, model and outputs. You take a representative sample of the data and compile descriptive statistics, including percent of those who had successful and unsuccessful freelance matches broken down by gender, race, and age. You then perform inferential statistics to identify whether protected characteristics like race, gender and age are significant predictors of whether a freelancer is successfully or unsuccessfully matched with an employer. It's of course, important to document whether these protected characteristics are highly correlated with other features that lead to successful freelance matches, like skills or ability to commit to large amounts of time on a project, and whether these correlations may have ingrained biases. Given this knowledge, you test whether the model you have developed furthers any biases identified. You carefully test your model and configure it to minimize biases through repeated tests. Third, to promote transparency and accountability, you document all processes in steps one and two to enable a record for the next technical audit. Governance, empirical and technical audits each serve important purposes. Now you can use auditing to inform your continuous monitoring process, which we'll cover next.

Contents