Legal AI beyond the hype: a duty to combat bias

Date published





The race to ‘AI that’

Artificial intelligence (AI) and machine learning are phrases which, until recently, would have drawn blanks at most law firms.

But things have changed. The world is more connected than ever, and technology has rapidly advanced. Choice and consumer empowerment is the flavour of the moment. Consequently, a more discerning client base now asks of its traditional legal service providers, ‘how are you going to deliver the services we require more efficiently and for less?’.

Law firms have had to react, and in order to remain competitive have raced to seek out the most talked about and revered technologies of the moment. The result has been a huge increase in the number of law firms, both large and small[1], talking about, experimenting with, adopting and offering AI and machine learning solutions to the market.

Whether such offerings are a genuine attempt to improve legal service delivery, or just buzzwords on a flyer to attract new clients, is not the subject of this article. Our focus is - to what extent have firms considered whether the AI they are licensing, building or selling (if that is what they are doing) has the potential to produce biased results?

Proceed with caution, mind the bias

AI and machine learning solutions, before they can predict or suggest outcomes, require training. Data scientists and programmers must feed AI algorithms, huge data sets to breakdown and digest in order to spot trends and patterns in the language, imagery or sound of those data sets. The resulting solution is only as good as the data it is trained on, and the human that trains it.

In the rush to create groundbreaking AI solutions, scrutiny has sometimes been lacking over what data sets have been used to train the solution and how it has been trained. In part, this has been fuelled by the hype around what AI can achieve and by when, and a preoccupation with, and misfocus on, questions such as should AI have employment rights[2] .

Consequently, bias has been allowed to creep in, often creating unethical, immoral and sometimes illegal results. Take the following examples:

  • Gender bias: software trained on text from Google News which, when asked to complete the statement “Man is to computer programmer as woman is to X”, replied, “homemaker”[3].
  • Sexual orientation bias: An AI solution which could ‘accurately’ predict a user’s sexual orientation having been trained on a data set of online dating photos, from a majority of white users - causing consternation in the LBGTQ+ community[4].
  • Racial bias: a solution to help fight crime in the US, determined young African-American men more likely to commit crimes; an algorithm that thought race was a sign of criminality. In reality, including socio-economic status in the data set would have removed the racial bias.

Government stimulation

Earlier this year, the UK government joined the conversation. It announced it was putting AI and data at the heart of its Industrial Strategy, and that it would be setting aside £12 million, specifically, to speed up the adoption of AI and data technologies that transform the accountancy, insurance and legal services sectors[5].

Thankfully, the government was alive to the issue of bias. An All-Party Parliamentary Group on Artificial Intelligence, set up in January 2017, discussed extensively the existence of bias and how to combat it. It concluded (amongst other things) that organisations must be accountable for the decisions made by the algorithms they use[6].

In June 2018 the government released its response to the Artificial Intelligence Select Committee’s Report on AI. It agreed that more needed to be done to ensure that data is truly representative and does not perpetuate societal inequalities, and that research and development teams must ensure data is preprocessed to ensure it is balanced and representative.

Bias in legal AI

So, is there potential for such biases to occur in legal AI solutions? And, should the organisations racing to build such solutions be acting to combat it? In short, yes and yes!

Take, for example, the application of AI (predictive analytics, natural language processing, and machine learning) in order to help clients predict legal case outcomes. To achieve such a result, a solution needs first to be trained on thousands of historic case documents before it can spot trends and patterns in the language of those documents that will lead it to conclude certain language in a document will equate to a certain case result.

But what if the documents our case prediction tool is being trained on were not representative? What if, for example, the training data consisted of 1,000 claims, 500 brought by women and 500 brought by men, but every one of the claims brought by a female was settled for under £20,000 and every one of the claims brought by a male was settled for over £20,000. Would that lead to the creation of a fair predictive solution? No.

Beating the bias

The discerning data scientist would firstly, interrogate that data and measure how unbalanced it is (unbalanced data could cause the model to be overfitted), and secondly, recognise that intrinsic properties - male or female - are factors that should not affect the outcome of a case. However, in reality they may do, as a reflection of societal bias in the justice system, and therefore the data scientist should question - should this bias be reported and/or should algorithms be calibrated to reflect this bias?

There is also the need to ensure transparency regarding the algorithms that are applied to such data, and how decisions are made by legal ‘AI’ solutions. Those affected by the decisions, and suggestions, of legal AI solutions, should be entitled to an understanding of the mechanisms upon which a decision was made about them. The application of opaque, ‘black box’, AI solutions in the legal system, particularly in respect of criminal justice and deprivations of liberty, poses serious ethical questions.

To a certain extent, such transparency is now a legal requirement under GDPR, but even where not, firms should be working to ensure transparency as far as possible. Models that are created, can, and should be, interrogated to be asked what factors were most important when a decision was made. It also remains possible for a human to make a judgment on; the overall merit of the decision, and any potential for discriminatory bias.

Unless a law firm has invested in discerning data scientists and implemented the right checks and balances - how can it be sure it is exploring these questions, independently verifying the answers, and defeating bias as far as possible when developing or offering legal AI?

The Kennedys approach

At Kennedys we are actively involved in the creation and testing of solutions that apply predictive analytics, natural language processing, and machine learning, to allow our clients to predict legal case outcomes, make quicker and better decisions on liability and settlement offers, and as a result - use lawyers less.

We are also alive to the issue of bias, and take a proactive approach towards the need to identify and combat it. We’ve invested in a diverse team of world leading data scientists who ensure the data we use is preprocessed, truly representative and balanced - as we pursue our ambitious plans for technological innovation. Our code of ethics helps ensure bias is eliminated to the highest extent possible, so that we can continue to make a difference for our clients.

This article was co-authored by Dr. Harvey Maddocks, Data Scientist, London.


  1. Legal AI now: the growing adoption of legal AI
  2. EU to vote on declaring robots to be 'electronic persons'
  3. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings
  4. LGBT groups denounce 'dangerous' AI that uses your face to guess sexuality
  5. Transforming accountancy, insurance and legal services with AI and data (small projects strand)
  6. APPG AI Findings 2017