1. The Introduction

The environment of courts and justice has always been rather conservative,1 as demonstrated by the judicial symbolism that alludes to (ideals) of ancient Greece. This is also seen, in its relationship to adoption of new technologies, that has arguably been sped up by the COVID-19 pandemic.2 The uptake of new technologies has slowly surpassed the “simple” virtualization of the procedures via videoconferencing software,3 and we are already witnessing an adoption of various systems based on artificial intelligence (AI). Besides the general uptake of technology,4 AI is also gradually taking on more substantive roles and is moving from simpler administrative tasks,5 towards decision-making systems, which is, as of now, being employed only in its “safer” mode, as an assistive or supportive decision-making system.6 Besides the technical limitation, the so far limited assistive uptake of such technologies merely in their assistive capacity, has several other reasons, such as social, technical or the issue of responsibility. As early as 2000, Tata has speculated that the place of AI in the judiciary has always been viewed through the optics of cognitive process rather than social, which ignores the role of the judiciary in the society.7 Its role is often seen as a human institution8 delivering an essentially human endeavour as justice.9 Therefore the gradual introduction of such technologies provides us with the opportunity to continually bridge the perceived trust gap of entrusting such essentially humane process to a non-human actors.10

However, at the present moment we may draw on some examples of systems working in such assistive capacity.11 Even though the supportive systems are being used merely by way of example – the conclusions of this article should be applicable even to AI-based systems deployed in a full decision-making capacity (in the future). The following article is thus concerned with AI-based decision-making technologies, which will be conceptualized in the following part of the article. Secondly, it is concerned with potential issues to fair trial rights,12 namely with the problem of algorithmic bias and its significance for fair (automated) trial. The concept of bias will be introduced based on its technological aspects, as well as from a legal point of view, most notably based on its interpretation under the European Convention on Human Rights (hereinafter as “ECHR”). The article holds that (judicial) bias is not a new phenomenon and should not be treated as such, even when it comes to new technologies, as will be further elaborated. Therefore, the existing framework of the European Court of Human Rights (hereinafter as “ECtHR”) on this matter will be utilized for said analysis. Given that this article is concerned with biased decision-making, to which it will present relevant case-law and theory, we will disregard any issues concerning possible predictive capabilities of judicial AI systems.13 Similarly, even though the question of accountability is closely related to the technology relevant for this article, dealing with the complexities of said question is beyond its scope.14

Based on this analysis, that will draw on both the introduction of the relevant technologies as well as the existing frameworks for biased decision making, the article seeks to answer the following issues: a) Is algorithmic bias problematic for supportive or substantive AI-based judicial technology; b) If so, what are the possible remedies; and c) What are the differences between the requirements for non-biased decision-making for humans and machines. Besides these, the last part of this article will introduce and evaluate some already existing legal instruments that are directed at the issue of bias in AI-based systems (in judicial practice). In general, this article seeks to show that the way in which we think about biased decision-making in “human” judges is, mutatis mutandis, applicable to algorithmic decision-making, and as such does not constitute a completely new problem in this regard.

2. The Technology

There is no shortage of Legaltech startups, promising more effective practice to lawyers via the utilization of any combination of the approaches falling under the umbrella of artificial intelligence. As private practices of law firms are more interesting from the business perspective of the developers of such automated tools, we have seen a great deal of developments in the field of lawyer-assisting AI systems,15 for example for a quick extraction of the relevant pieces of information from large quanta of documents.16 More advanced examples of AI-based legal technology are different analytical systems, most notably a case prediction software, such as LexMachina.17 Such “analytical” tools thread a very narrow line, as was shown by France, that has banned case prediction due to the fears of possible interference with the right to a fair trial.18 Publication of such predictions now carries a hefty fine or even a prison sentence.19

AI based legal tech is becoming prevalent in the litigants’ sector,20 however the courts are not staying behind. Besides some of the aforementioned technologies, there are other technologies specifically targeted at courts and their decision-making processes. By their nature these systems pose much higher risk for fundamental rights, as they stand to influence21 – if not make – the binding decisions about the relevant rights and obligations.

It’s nearly impossible to deal with the issue of algorithmic bias and not mention COMPAS,22 which has since 2016 became almost synonymous with this issue.23 This system is utilized by courts in several states of the USA for risk assessment. Based on input data that are mostly gathered via a questionnaire, the system based on non-public proprietary algorithm computes a “risk score” ranging from 0 to 9.9, that should indicate to the judge how likely is the person to re-offend.24 This risk score is only a partial factor, based on which the judge makes her decision, therefore this system should play only an assistive role in the whole decision-making process. This becomes problematic when the judge relies solely on the COMPAS score, which ultimately in itself goes against the right to a fair trial, as has been for example stated by the Supreme court of Wisconsin in the case of State v. Loomis.25 This sole reliance on the COMPAS score becomes even more problematic when we consider that the logic of this system is not made public due to “trade secret”26 protection. This is especially problematic when we consider its reported bias against Afro-American population. This issue will be described more in the following section, but it represents the main reason for choosing this system as an example.

The example that showcases the whole spectrum of the possible application, and one where such potential threats to the right to a fair trial are more pressing, is the case of full judicial decision-making automation, or the so-called “robo-judges”.27 Full automation of judicial decision-making is still in its infancy, however that is precisely the moment we should start tackling the potential problems it may introduce.28 We can currently find two examples, even though their relevance and actual state is questionable. One example, that has received a lot of attention,29 even though it was gravely misrepresented in many cases – is the case of Estonian robo-judges. Ministry of Justice of Estonia has however objected30 to their system being denoted as such, since in this case it was rather simplistic “process optimalization” as opposed to some advanced judicial decision-making system. The system deployed by Estonia was used to deal with simple cases, where no actual judgment was required, and the claims were “just” rejected for lack of standing etc. However, even this deserves a proper attention and is not to be dismissed nor diminished, since this system has already put the human out of the loop and its decisions were binding.

The other, supposedly31 more advanced, example is the case of the Chinese robo-judges32 and even prosecutors.33 China is moving more towards heavily digitalized justice system with deployment of “one-stop-shop” judicial apps, that enable the user to file motions, provide testimonies and monitor the progress of their case.34 More importantly, several provinces are using automated systems to analyse relevant data (similarly to COMPAS – but on a more complex scale) and based on this, draft a judgment that is, depending on the province, either binding or subject to a human approval.35 Currently such systems deal mostly with small claims36 and misdemeanours or traffic violations. Some provinces are reportedly attempting to deploy such systems in criminal justice as well.37 This should be however mostly in an assistive capacity and primarily used for sentencing, with the argument that sentencing greatly varies throughout the China and this should help to standardize it.38

These examples describe basic models of relevant implementation of AI-based systems in judicial decision-making, that should to a certain degree reflect also a possible future development. In the following impact analysis of algorithmic bias, the role of AI-based systems could be double. It could either be in an assistive capacity, meaning that this system is not fully autonomous and (substantially)39 requires a human-in-the-loop. The other implementation considered by this article is full automation, where the AI-based system, after an input from users,40 carries out the legal analysis and then renders an enforceable judgment. This judgment is ideally binding on its own, but may subsequently require a confirmation by a human judge.41 While consideration of such fully automated systems seems more pressing due to their possibility of greater threat to fundamental rights, it is generally a good idea to consider also assistive systems, for they might as well pose such risk, as we will demonstrate on the example of COMPAS, and they are currently being used to a greater extent. What also can’t be underestimated is the fact that fully automated means tend to be deployed firstly at their semi-automated capacity42 which was, for example, the case of Chinese Xiao Zhi 3.043 system that started in 2019 as a mere assistive system before being deployed as fully automated some years later.

Having mapped out a general understanding as to what kind of systems we are discussing, we will move to the next section which will introduce the issue of bias from both technical and legal perspective.

3. The Bias

The issue of bias is an area with a great focus from both law44 and computer science. So much so that in the area of artificial intelligence it forms almost its own sub-area of study.45 As such, this article cannot give a comprehensive overview of the technical aspect of this problem but will describe it to the necessary extent. On the legal side the issue of bias, does not only constitute one of the rights under the umbrella term of the right to a fair trial,46 but there exists a considerable amount of case law47 and theoretical works, hinting at the importance of this right. The following section introduces this issue from both angles.

3.1. Technical Perspective

Generally, the issue of algorithmic bias may have two causes.48 Firstly it is the need for (historical) training data, which may themselves be biased in various ways49 and secondly the bias may be introduced by an improper design of the decision making algorithm itself.50 This issue is often amplified by the biased data, in a vicious circle, since many machine learning approaches are creating their own algorithms based on the datasets in which they are trying to identify and then recreate patterns.51 It is then not that the machine learning or algorithms as such would create or introduce a new forms of biases, however the issue is also more serious than a mere recreation of existing “human” biases.52 The grave issue here is the effect of bias amplification,53 which is introduced by taking the past biased decisions, recognizing their pattern as the norm and subsequently applying it at much greater, automated, scale than would previously be possible. An example of this is the now scrapped Amazon hiring tool,54 that was supposed to review submitted applications and do a preliminary filtering of unfit applicants. Since Amazon did not have many women in the executive roles, the algorithms recognized this pattern and figured out that being a woman disqualifies the applicant from seeking this position. It then went on to throw out all female applicant’s CVs during the hiring process to such roles.55 The bias has already existed in the hiring structures of Amazon, however it was amplified by the automation.56

This is a rather basic overview of the bias issue from the technical side,57 which does not exhaust the interesting, and even legally important, analysis offered by computer science. Further, it does not offer a mere description of the problem but suggests a solution as well. The seemingly obvious answer to this issue is to simply not consider, thusly not provide, information that could lead to bias (the “protected characteristics”), such as race or gender. It has however been repeatedly demonstrated that this does not work, since machine learning algorithms may utilize “proxy characteristics” as a stand-in for the unavailable protected characteristics.58 This can come from such simple proxies like name, where certain names would be more typical for one gender, to more complex proxies such as an address or the age at first arrest as proxies for race, (an actual example from the COMPAS system),59 which is a situation that slowly merges both technical and legal perspectives.

3.1.1. Fairness through ignorance and the COMPAS example

Researching the COMPAS system is particularly hard due to the fact that the algorithm itself is protected by Northpointe, the producer of this system, as a business secret.60 In 2016 a non-profit organization ProPublica attempted to at least evaluate the reliability of the system by obtaining the risk scores of over 7,000 people in one county, awarded over 2 years.61 They then compared the risk scores with an actual future criminal record, leaving the timeframe of 3 years for any future offence as per the Northpointe benchmarks, concluding that only 20% of the people that were judged as high-risk reoffenders committed any further crimes.62 Besides this unreliability, the study has highlighted a different issue – that the system was prevalently wrong in the high-risk cases. The system could however formally show a higher reliability, since the low-risk offenders were usually low risk as judged by the system.63 Which directly leads to the last point of the ProPublica study and that is the observation that Afro-American population was more likely to be awarded a high-risk score and therefore to be wrongly placed amongst the high-risk reoffenders. ProPublica further points to the questionnaire that is used for input data that contains questions about parents being in jail or illegal drugs being distributed in the neighbourhood, disadvantaging people from low-income areas and households.64 Therefore, even without the protected characteristics being disclosed, the system still operated with them via a proxy characteristics. The issue of the questionnaire points also to a different origin of algorithmic bias that is often (mis)used in the analysis of these algorithms. For example, in this case one could argue that the age at first arrest or the fact that one of the parents was in jail points to either personal failings of the defendant or failings in the upbringing that are likely to make him more susceptible to commit (further) crimes. However, such an argument requires more broad critical evaluation of different societal phenomena such as over-policing of certain areas or minorities,65 or so called broken windows policing approach.66 As Yeung points out this kind of bias is not the “fault” of algorithms as they more or less properly encode the data,67 the issue rather lies in historic injustices that were suffered by a certain group which will now be encoded by the algorithm relying on the past data revictimizing the disadvantaged group and deepening the patterns of past oppressions.68

As per the solutions to this problem offered by computer science, they, to a certain degree, mirror the current legal recommendations, which is good news, since in the end the solution will require the employment of a certain technical solution based on a legal act. Those solutions usually rely on better monitoring of the presence of bias, via methods such as counterfactuality that, simply put, applies the idea that a decision is fair (or non-biased) if the automated decision is the same in a) real world, as well as b) “counterfactual world” where the defendant belongs to a different demographic, i.e. the potentially discriminating characteristics are different. The other direct yet more tiresome solution is to monitor the training data for bias and correct it during the design process.69 Close monitoring however remains presently the most viable solution and a first step in the mitigation process. For the purpose of monitoring, it seems to be fitting to utilize the framework established by the ECtHR, which will be introduced in the following section, since it has been established for the purpose of protecting and enabling non-biased judicial decision-making. And as this article argues it should be used for any decision-making, regardless of the way in which it is carried out.

The right to a non-biased court (decision) is a part of the umbrella right to a fair trial and is almost exclusively interpreted jointly with the right to an independent court.70 As much as they are described as an external and internal expression of the same principle, this article focuses solely on the issue of bias.71 In the existing case law of the ECtHR there are two levels at which bias in judicial decision making can appear, and there are two associated tests for it – internal and external. The internal aspect pertains to any pre-existing convictions or beliefs of the judge that could skew her decision-making in some particular way. The external aspect of it is rather closely tied to the requirement of independence, judging whether the personal composition of tribunal offers sufficient safeguards for non-biased decision-making. Lastly, there also exists the requirement of appearance, meaning that the tribunal must not only be non-biased, it also must appear to be so.72 A much simplified, yet rather helpful, description of this requirement is also provided by Bingham, who describes the goal of this requirement (together with the requirement for the independence of tribunal) as the guarantee that the case will be decided (by a judge) based on her best conscious, relevant facts and applicable law only.73

Bias is an internal disposition of the judge and as such it is hard to determine. That is the reason why the ECtHR has focused almost exclusively on the objective test of bias.74 As such, one of the basic factors that can determine judges internal disposition is the behaviour towards parties, that can be seen as hostile or biased,75 which could be, for example, demonstrated by the language the judge uses.76 It is worth noting that there exists a certain threshold for this, since it as has been stated by the ECtHR, justice needs to keep a “human face” and a mere expressions of understanding or sympathy towards the aggrieved party are not sufficient.77 Judges conduct is therefore the only way in which we may ascertain his or hers bias in the given case by the subjective test. The objective test offers more options in this regard.

The objective test deals with the available safeguards guaranteeing the impartiality of the tribunal. Further, it is also concerned with the perception of such by the parties, however, this factor plays only a minor role.78 The objective test would be, for example, failed by a judge who has made public statements about the matter that were perceived as biased.79 Another example is when the judge has already presided over the same case in a different branch or judicial instance, as those are objectively provable instances of their “internal” states (i.e. that the judge has already “made up their mind”).80 Objective test may be also failed in case of any financial or personal link between the judge and one of the parties,81 or if there exists a personal interest of the judge relevant to the case.82

Drawing on the bias-related knowledge from the two relevant fields for algorithmic decision-making, computer science and law, we can conclude two important observations, which will be further dealt with in the following section.

Firstly, borrowing from the computer science, we have observed that mitigating bias requires its identification and monitoring, which can be done by carrying out tests – monitoring the decision-making outcomes for any negative patterns. Secondly, looking at the legal side, we have observed that there exists a framework, established by the ECtHR for identifying biased decision making. It has two parts – subjective and objective test. The subjective test is rather complicated to perform due to its nature and as such not often used and it even may seem irrelevant for algorithmic decision-making, since it is aimed at “the personal conviction and behaviour of a particular judge, that is, whether the judge held any personal prejudice or bias”.83 While it may be argued that it is impossible for an algorithm to possess such anthropic quality as personal conviction and display a certain behaviour, on the other hand this is where the mutatis mutandis part of the suitability comes to front. Within computer science, an “artificial intelligence behaviour” field of study is emerging84 that is dedicated to investigating “thought-like” inferences on which AI systems are basing their decisions. In this vein then, the “personal conviction and behaviour” could be taken to mean the internal specifics of the algorithm. In the same way that the fault in human judgement arises as a result of a “personal conviction”, the fault in automated decision is arising as a result of “internal specifics of the algorithm”.85 Fortunately, it is not necessary for us to attempt to draw any comparisons between the human and AI “thinking”,86 since this is where the framework established by the ECtHR becomes useful once again with the second, objective, test. This test implores us to examine external factors that could point to possible (internal) bias,87 as those are interconnected.88 More precisely, it asks us whether such necessary steps have been taken, to ensure that the decision-making is not biased, particularly whether the composition of tribunal offers sufficient safeguards in this regard.

Similarly to the subjective test, the objective test provides algorithmic justice with a suitable framework for testing the bias in decision-making process, with the necessary adaptations to “non-human decision makers” since considering financial or personal ties makes no sense in this case. What should concern us is the general test of whether the court “offered sufficient guarantees to exclude any legitimate doubt in this respect”.89 Given the unstable90 nature of some91 machine learning models, iterative audits performed during the life cycle of such system, which are described in the following section, are to be viewed as mutatis mutandis fulfilling the requirements established by the two tests of ECtHR, since such test can discern a presence of potentially negative patterns in the past “actions” of the algorithm. It further fulfils the objective test by offering guarantees in regard to the (un)biased decision, that still respects the evolutive and black box92 nature of machine learning based algorithm.93 The deployment of such test, and its fulfilment of the Article 6 requirements, will be examined further in the context of existing legal documents in the following section.

4. The Interplay

Besides the research questions stipulated in the Introduction part of this article, its overall goal is to demonstrate our “preparedness” for the implementation of automated decision making, at least to the extent to which it concerns bias. This thesis is essentially alluded to in the titles’ “old problems” and it points to the fact that problem of biased decision making is nothing new, unknown nor underdeveloped in the case law. Such has been demonstrated in the previous sections by analysing the issue of bias as it pertains to the right to fair trial, as well as by analysing the general applicability of currently established tools, such as the objective and subjective tests of ECtHR. However, the question of whether our basic understanding requires at least some reinterpretation in the face of the “new technology”, remains.

4.1 Current regulatory landscape

There are currently several regulatory frameworks, attempting to address this issue,94 with varying degree of their legal force. One of the earlier frameworks, focusing on judicial use and problems relevant to this article, is the European Ethical Charter on the use of Artificial Intelligence in judicial systems and their environment95 published in 2018 by the European Commission for the Efficiency of Justice (the “CEPEJ”). The Charter proposes 5 principles for the ethical use of AI in judiciary, such as transparency and respect for fundamental rights. Further in the Charter, the issue of non-biased decision-making can be found under the general respect for fundamental rights. This in turn should encompass the right to a fair trial which, as has been described in the third section of this article, should in turn encompass the right to an impartial decision. Such an analysis is however not strictly necessary, since one of the five principles expressly refers to the need for ensuring non-discrimination. Tangentially, there is also a principle of quality (of input data) which, if implemented properly, mitigates the potential for bias, insofar as it pertains to training data-emergent bias.96 Further, the fourth principle encompasses transparency, impartiality and fairness. Thusly, it appears that CEPEJ sees this issue as deserving a particular attention, as it is present in three out of five principles.

Interestingly, CEPEJ does not call for direct ban on use of sensitive data, such as data about race, but emphasizes the need for “particular care” when utilizing them. This issue as such however does not get further, or explicit, attention, despite the fact that the potential use and even availability of the protected characteristics seems to be a rather contentious issue. So much so that a mere lack of explicit ban on the processing of such information should not be taken as an argument for their processing. However, from the remaining principles presented by the CEPEJ, we may draw exactly this conclusion. More precisely, similarly to scholars such as Cofone, it would seem that by their insistence on the diversity in development as well as rather robust monitoring requirements, they see the utility of these characteristics in their potential for combating and identifying these negative patterns in ‘real world’ deployment of such systems.97 Under the principle of non-discrimination it calls for a subsequent monitoring and mitigation approach. The principle of impartiality should be fulfilled by using diverse datasets and by a regular monitoring of such systems by a public body. The Appendix deals with this issue firstly separately, in relation to assistive technology. Namely technologies of asset dividers,98 where it limits itself to a warning of potential bias and establishes that the use of (these) technologies must not deprive a person of access to court. Lastly, the issue of potential bias in the AI-based tools is mentioned in the Appendix as a base for the need to set up ethical frameworks and hold an in-depth public discussion on these issues. The need for public discussion refers to the need to keep “in-the-loop” the potentially discriminated part of the population during the implementation and design phases. The need for the establishment of an ethical framework is explained by problematic adoption, and mostly enforcement of, a legal framework in the digital, thus supranational, environment.

The Charter also presents some basic remedies for algorithmic bias, such as certification schemes or subsequent control and mitigation processes. It also stipulates the need to keep human-in-the-loop at some level, i.e. in the second judicial instance or similar, which is already familiar requirement to us, for example from GDPR.99 Such a tool – of human oversight – serves as the possibility of obtaining a human intervention that could and should rectify the possible incorrect decision that has been based on tainted data or improper algorithmic design. As we will see, those recommendations are mostly reiterated in other documents.

There currently does not exist a binding legal document that would deal with the issue of judicial use of artificial intelligence directly. However, we have the now final European Artificial Intelligence Act (hereinafter the “AI Act”)100 AI white paper, as well as the preparatory works produced by the High-Level Expert Group on Artificial Intelligence (“AI HLEG”).101 Most notably the White Paper on Artificial Intelligence102 and Guidelines for Trustworthy AI103 were produced by the AI HLEG. As noted by Crofts and Rijswijk, the White Paper does not deal with the issue of bias directly and instead opts for the issue of discrimination,104 however the problem as well as the remedies are identical.

The Guidelines do not define the origin or even the term bias, but they mention “inadvertent historic bias” present in the necessary datasets. The main remedy thus focuses on the data, specifically their representativeness and inclusivity, which should be, in a similar fashion to CEPEJs’ recommendation, extended even to the whole design and implementation phase.

The White Paper emphasises an important point for this discussion, and that is the fact that virtually any of our current decision-making suffers from being biased.105 Besides the repeated mantra of proper data management, the White Paper also suggests that this issue could be mitigated by a proper log keeping, especially of the creation process. These frameworks are not only non-binding but they themselves do not pay much attention to the ways of achieving the recommendations.106 This is then demonstrated by a certain divide between the ”technical” and ”legal” where one is not necessarily connected to the other, or more precisely when approach required by one is not possible by the other. This was, for example, a rather common critique of the (earlier versions) of the AI Act107 that has required technically impossible safeguards, such as precise log keeping that goes against the nature of some of the Machine Learning approaches, such as DeepLearning. A similar discussion is currently ongoing on the requirement of the data management and its utility.108 However, we hold that while this discussion is pertinent, so is the requirement for data management that has so far found its place in all the existing tools and recommendations. After all, even the argument of Hooker does not directly disregard this safeguard but rather tries to establish a more “efficient” or suitable one – and in general argues for the need for more nuanced discussion on the issue of the (origins of) bias.109 Hooker further argues that a specific focus should be paid to the design of the relevant algorithmic models, thusly extending the need for inclusivity even into the design phase, which is a recommendation already known to us from the AI HLEG proposals.

Some of the existing legal tools are, quite reasonably, utilising a certain “middle of the road” approach to this criticism. While the questions of data management cannot be overlooked or disregarded, the inclusivity approach and focus – as well as the subsequent control – should further extend to the design phase as well. In theory, this approach has been explored for example by Cofone, who has argued that after properly exploring the existing datasets we may utilize their ‘faults’, such as the inclusion of protected characteristics, to properly train the algorithms in such controlled fashion, that would better prepare them for encountering the characteristics in the ‘real world’.110

Hooker’s criticism however brings another, perhaps more pressing, issue to the forefront of our discussion. This point seems to be rather neglected in the current discussion on the use of the automated tools by the judiciary, especially by the more sceptical voices. The same way that Hooker argues that the discussion on possibilities of automation (and the bias of such automated systems) should primarily nudge us towards the discussion of the bias present in the judiciary currently – we should keep in mind that the presented framework is viable only if we accept that the current system of judiciary and the way it deals with bias is working properly. If we want to point out the shortcomings of adapting the existing framework to automated decision-making framework we need to also have a discussion on how these shortcomings impact our current decision-making processes. Outside of the possibility of the future automation, this discussion could also serve to the benefit and improvement of the existing system. That is despite the fact that some authors, such as Chatziathanasiou argue that using the flaws of the existing system of judiciary as an argument in favour of the use of automation is dangerous, as it may lead to the erosion of public trust in the institutions.111 We hold that such an argument is misleading and actually holding the discussion on the current shortcomings of the system in the face of its possible new developments should have the opposite effect in a democratic liberal state. Kashif and Jianxin have for example argued that the utilisation of AI-based decision-making tools could help us properly identify the existing flaws in the form of structural biases present in the system, which could subsequently be used to not only improve the training data in the data management phase but overall help us identify areas of improvement in the existing judicial decision-making.112

A certain ‘acceptance’ of the mere adoption model of the framework could be represented by the AI Act. It differs from the previously mentioned documents not only by the fact that it will be binding, but also because it establishes processes that are to be put in place to satisfy its goals. The AI Act explicitly warns against bias in judicial use of AI in the preliminaries,113 however the articles generally focus on fundamental rights as such. The AI Act uses risk-based classification, and based on the recommendations of Point 40 as well as classification guidelines set out in the Annex III of the AI Act, it appears that the judicial use of AI should fall into the high-risk category.114 This leads to the proposed remedies, which are unfortunately not applicable to any potentially biased AI-systems, but only to those that are categorized as high-risk.115 The most relevant requirement for these systems is to set up an iterative internal audit process, that would periodically check throughout the whole lifetime of the system, its impact on the fundamental rights, as well as compliance with other requirements. The audit should check for incorrectness of decisions, for example due to the decisions being biased, and whether they impact any fundamental right. The following requirement is to mitigate such possible impact or ideally remove it completely.

A similar approach seems to be currently favoured, at least in the context of AI decision-making, by the Canadian directive automated decision-making116 that is trying to address the issue of bias directly via a similar internal audit process and certification scheme. This prevalence of iterative audits in AI-based, decision-making systems is expected, given their technical nature.117 The iterative nature of the test seems to be a fitting response to the dynamic nature of “learning” systems, since the control mechanism is not fixed in time, the same way the decision-making algorithm is not.118 Similarly, such approach seems to be a one way119 of responding to the issue of “black box”,120 that has been utilised for example by the Canadian Directive on Automated decision making,121 that points out that the iterative nature of such an audit allows the auditor to collect a dataset of the AI decisions and examine them for potential errors and biased decisions. This approach side-steps the issue of black box, since the potentially harmful pattern is being identified over the (large) dataset of outcomes, not the algorithmic process directly.122

The impact of the potentially biased decision, as well as the possible recourse, should serve as a guidance to how often such audit should be carried out. The first aspect of the impact is rather obvious and could be taken to essentially mean that the more serious the impact, the more often the system ought to be audited. For more specific guidance this should be taken together with the second aspect, that of the possible recourse. Therefore, the period of the audit should be such that the possible recourse is still available to the aggrieved party in a case that an error or defect is identified. This should inform the period even in such a case where this outcome of audit forms a new appellate reason, with its own timeframe set out in the procedural rules, that reflects the different nature of this appellate reason, as it should present a greater level of legal certainty not only for the aggrieved party, but for the other party as well.

The approach of iterative audits seems to be favoured even outside the public applications of artificial intelligence and automated decision-making, for example in the area of HR management and hiring processes123 Besides the relative sensitivity and the potential impact on the applicants life of the hiring process decision-making124 that makes this area of automation similar to the automation in judicial decision-making, hiring processes were also one of the first to undergo automation with rather undesirable results.125 Besides favouring the suggested approach of iterative audit, this area of decision-making automation seems to favour the certification approach as well.126

The (public decision-making) certification scheme could then be imagined as a direct follow-up on the iterative audit.127 Such understanding is favourable to the user of the system, in our case a court or different public body carrying out the decision-making, as it leaves them with option to either obtain such necessary certification by either outsourcing it to a third party, by obtaining a certificate from some specialized national supervisory authority,128 or by performing a self-certification, as is currently wide-spread approach in other technological areas, such as cybersecurity.129 Ajunwa in her article points to the possible issue of regulatory capture in the certification scheme.130 This is a pertinent issue that should not be overlooked, nor understated, however it should make us pay greater attention to this issue when designing the certification schemes, rather than see it as an absolute obstacle in their adoption. After all, when it comes to any regulation of new technologies, regulatory capture is always a great risk (as it was for example with AI Act and regulating artificial intelligence) but that should not stop us from regulating emerging technologies the right way.131

4.2 De lege ferenda and our current understanding of the issue

New issues often bring new attempts to solve them, which appears to be the case for algorithmic decision making as well. One such attempt is the concept of FAT algorithms (fair, accountable, transparent) with which Barabas argues for tackling the issue of biased decision-making as a part of the broader issue of fairness of these decisions. That should be achieved by a) establishing formal criteria for fairness and b) developing best practices to maintain them.132 This leads us to the last question of this article.

We have already established that a) bias is an issue for (semi)automated decision-making, and we have described b) potential remedies, mostly based on monitoring processes, that are currently rather popular within different frameworks133. This leaves us with the last point of c) discussing the need for creating new criteria for bias (or fairness) in decision-making. Which begs the question to what level we should hold algorithmic processes to a higher standard than we do for the current “human-based” ones.

Susskind argues that any technological change must not be done purely for its own sake, but in order to fulfil some broader need in society, to address an existing issue.134 Such thesis seems pretty reasonable, the question however is one of proportionality. If we help to resolve the issue of critically overloaded judicial system, do we still need to provide for greater non-bias than is currently the norm? We hold that this is not the case. Not necessarily since the betterment of an overloaded judicial system is enough, but due to the fact that the current interpretation of impartial tribunal sufficiently fulfils requirements of a fair trial. Thus, it should not be required to create a new criterion for the concept of fairness (and subsequently bias) but rather apply, and reinterpret, if necessary, the existing framework, much in the spirit of the “living instrument”.135 It follows then, that should we accept that our current judicial system is unbiased, or more precisely that the standard we hold it to provides us with enough confidence in its decisions, then applying the same existing standard, with the necessary alteration,136 should result in automated systems that provide us with enough confidence in their decision making, therefore there should not be any reason to create a new standard.137 Further, this is supported by the very nature of the right to a fair trial, that is aimed at protection of (fundamental rights of) the citizens, therefor the relevant factor ought the be the impact on citizens’ rights, rather than the process by which it is created.

5. Conclusion

There is no lack of issues pertaining to judiciary use of AI that still remain to be tackled before we can implement a full judicial automation into judicial systems.138 Even then, there is still a great discussion to be had about other, more innovative potentials of these systems, such as predictive justice.139 This article was aimed at analysing only one of such problematic aspects, that of machine bias. While this issue presents a rather grave issue for the potential automated judicial decision-making, we have argued that we already have the necessary frameworks in place for us to properly deal with this issue. Further, being able to adopt the existing protection frameworks in such manner that they apply in these novel cases, specifically in the case of public automation, should serve to minimize the possible negative impact of such, as well as it should providing sufficient response to the critics, raising the issue of ‘algorithmic tyranny’.140 This rather important observation stems from the basic premise of this article and that is the fact that the protection we currently have in place is sufficient. As described by the references to the Susskindian maxim, this should however only present the minimal case scenario at which we can accept the utilisation of such systems and should not prevent us from further discussions on how to safeguards these fundamental rights even better.141

The conclusion of the existing sufficient framework follows from the observations made firstly in the introductory parts that have introduced the technical aspects of this problem and subsequently the existing framework, introduced by the ECtHR, for dealing with biased decision-making. Most importantly, the internal and external test for biased decision making were introduced.

The internal test, which requires that the judge should not hold any pre-existing beliefs that could skew her decision making, can be directly tied to the requirement of proper data handling, and be subsequently checked during the iterative audits that should search for emergence of unwanted biased patterns. Similarly to our current interpretation, unless proven otherwise, such system should be considered non-biased. As per the external test, it essentially establishes certain objective checks for the existence of bias. These could be reinterpreted or fulfilled by the required level of transparency (and log keeping) of such decision-making systems, not only on the level of individual decisions, but as well on the level of design and implementation process and should be made widely available to public and open research. As per the “process” side of this issue, the first internal aspect should be satisfied precisely by the appropriate iterative internal audit and the second external aspect should be satisfied by an appropriate certification scheme.

This was only a very broad introduction to this, legally and technologically complex issue, not only to the problem as such, but to our current understanding as well as currently favoured remedies. Adoption of such AI-based tools into judiciary practice could help us to deal with some prevalent issues, such as the length of proceedings and the overburdened judicial system.142 On the other, if not deployed properly, such systems could have a grave impact on fundamental rights, such as the right to a fair trial, therefore design and implementation of appropriate tools, such as the suggested iterative controls must not be underestimated.