Skip to main content

How free software / open source addresses bias in AI

OSS and AI bias

Published on: 20/10/2022 Last update: 03/11/2022 News

Picture of AI on a chip
The problem

Bias is difficult to detect when developing an AI system because the software you write is not the AI. You write software, you feed it a large set of training data that is tagged with values and the software produces an AI that can attribute those values to new data. So, if you give it a set of pictures, and someone has tagged the interesting ones as A and the boring ones as B, it will produce an AI which can look at new pictures and tag them as A or B.

The problem is that the AI isn't created in a way such that humans can understand what rules or patterns it is using to connect the data to the values. Humans would create a set of rules such as "curved lines", "variety of colours", but the generated AI can have millions of tiny rules. Reviewing the rules to find bias is thus often impractical, so bias is usually detected by looking at the output, and sometimes this only becomes visible after a lot of use.

The bias could originate in the software or in the data. Or sometimes it comes from the data but has to be fixed in the software. The data might accurately show that visibly pregnant women are more likely to take maternity leave than women who are not visibly pregnant, or than men. There is no error in the data, but as a society we have decided that this criteria cannot be used in, for example, hiring decisions. That's something that usually has to be addressed in the software, but it still may not be easy.  It's almost impossible to give an AI an instruction such as "ignore pregnancy status" because AI isn't applying a simple rule about body shape but rather millions of tiny rules and it's hard to decide which of these rules are directly or partially or indirectly linked to pregnancy.

How access to source code helps

But that doesn't mean access to the source code is pointless. It's crucial for four reasons:

  • Bias can be due to something in the software generating the AI
  • Maybe the software needs explicit counter-bias to be programmed in
  • Access to change the source code means the service provider can address the issue when bias is detected
  • Public access to the source code allows an ecosystem of experts to emerge, beyond those who work for companies selling AI

Extra care for public administrations

Public administrations may be held to an even higher standard than the private sector, for two reasons:

  • Public administrations have a monopoly on providing certain services
  • In some cases, the citizens have an obligation to use the service

Trivia

Free software, as a defined concept, and before the additional term open source was even invented, was born in an AI lab. The MIT AI lab is where Richard Stallman was working before he resigned to work on the GNU Project. (More at Wikipedia's History of free and open-source software.)

Further reading

The above is a simple explanation of the main issues, but there are also discussions ongoing about open data, verifiability, the copyright of the generated AI, the copyright of data that an AI generates, liability for actions taken based on AI, and many more topics.

The following links may be of interest: