Biased, are we?

Intelligent systems and Racial bias – a couple of remarks

In, Rose Eveleth talks about the racial bias inherent in Facial Recognition systems. It is an interesting read. I would like, if I may, to make a couple of comments.


I must admit, first of all, that Rose Eveleth is correct. Face Detection and Facial Recognition systems suffer from racial bias. I have used many such systems, and they are all, always, racially biased. But I would just like to elaborate a little bit on that issue.

The underlying problem is simple – you need to train a system on a large “test set” of sample faces. Those “test sets” need to be generated by human beings, at a high cost. There are now two human biases coming into play:

  1. Since the train set is usually being done by a company based in a certain country, most faces the system would train on would be faces of people from that specific country. The best face detection systems come from East Asia, and – as one can expect – those systems are much better tuned to east-Asian faces.
  2. The human perception system is racially biased. We can easily detect minute differences in faces similar to what we saw as children, and we’re much worse at telling the difference between foreign faces. After all, we are all computational learning machines, which were trained to decipher faces during early childhood. So, unsurprisingly, we are reasonably good (not exceptionally good) at distinguishing faces whose likes we saw as children, and very bad at distinguishing people of different races. You can often hear Caucasians saying that “all those Chinese people” are the same. That is because they all look very similar to someone who is not of oriental decent – just as Caucasian faces are “all the same” to oriental people.

This explains (to some degree) the human natural tendency for racism – foreigners are hard to discern, and thus it is all too easy to view them as a homogeneous mass of non-individuals. Just listen to Trump: “We” are always nuanced individuals; “They” are always faceless and homogeneous.
(As a side note – perhaps “single-faced” is a better term than “faceless”. The chauvinist does not necessarily reject the humanity of his hated group. It’s too often their individualism he denies).
But this also explains why engineers training Facial Recognition systems are hard pressed to tell the difference between people of foreign descents. The preparation of a good train set is a very exhausting task as it is (think about attempting to tell the difference between your co-workers’ children). Generating such a set on people from a different race than yours is very hard. Very hard = very expensive. Good luck explaining all this to the businessmen running your corporation, when they ask you why it takes you two years to train a system which took you two months to program.


But there is one point in which Eveleth’s article is misleading. The algorithms doing image classification are not as biased. The problem there is not a bias in the algorithm, but rather a combination of anthropomorphism, unreal expectations, and an expectation of bias – and the problem is not in the algorithms themselves but in the audience. The classifier algorithms we currently have are very crude. And the reason most classifiers would put a dark skinned person in the “ape” category is – surprisingly enough – because of lack of bias.
We are animals, members of the class Mammalia. The genus Homo is a member of the order of Primates. So, when an algorithm specifies a human being as an “ape” – it is actually right. The specific examples given in Eveleth’s article are of algorithms which designate many people as apes – regardless of skin color. White people were specified “ape” as well, but very few complains came from white people. They are privileged enough to take it at face value.

I’ll be more exact. Classification algorithm have two tiers. First, they try to match the image to a set of possible labels, and gives a score for each possible match. The labels are not exclusive. There is a label for “furniture”, a label for “wood”, a label for “chair” and a label for “table”. So a wooden table in a picture might get a 0.9 score as a furniture, 0.7 as wood, 0.5 as chair and 0.6 as table. The second part of the algorithm is given the probabilities, and needs to decide which labels to declare “best match”. In our case, the table would be declared a “wooden furniture”, but the low score as a table, combined with a close match as a chair would cause the labeling algorithm to stop there, and refrain from deciding on the specific type of furniture.
In the case of humans, a state-of-the-art algorithm would usually tell that the image is of a human at a good probability, and refrain from specifying the more general possibility (“ape” or “animal”). But, from time to time, the algorithm would be unsure, and hence declare the more general qualia (e.g. “animal” or “ape”) and stop there. This is not a matter of racial bias. It would necessarily happen every once in a while. It actually does happen at some constant probability (per algorithm), and hence you would see that effect if you look at enough human pictures.

If you expect the algorithm to make racial slurs, you would certainly be offended. If you don’t expect to encounter racial slurs, you’d probably find it harmlessly amusing. This specific bias is not in the algorithm, but in our expectations of it.


The bottom line is always the same – technology is limited by the people who make it, and is degraded by the people who use it.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s