With social media and online services are now huge parts of daily life to the point that our entire world is being shaped by algorithms. Arcane in their workings, they are responsible for the content we see and the adverts we’re shown. Just as importantly, they decide what is hidden from view as well.
Important: Much of this post discusses the performance of a live website algorithm. Some of the links in this post may not perform as reported if viewed at a later date.
Recently, [Colin Madland] posted some screenshots of a Zoom meeting to Twitter, pointing out how Zoom’s background detection algorithm had improperly erased the head of a colleague with darker skin. In doing so, [Colin] noticed a strange effect — although the screenshot he submitted shows both of their faces, Twitter would always crop the image to show just his light-skinned face, no matter the image orientation. The Twitter community raced to explore the problem, and the fallout was swift.
Intentions != Results
Twitter users began to iterate on the problem, testing over and over again with different images. Stock photo models were used, as well as newsreaders, and images of Lenny and Carl from the Simpsons, In the majority of cases, Twitter’s algorithm cropped images to focus on the lighter-skinned face in a photo. In perhaps the most ridiculous example, the algorithm cropped to a black comedian pretending to be white over a normal image of the same comedian.
Many experiments were undertaken, controlling for factors such as differing backgrounds, clothing, or image sharpness. Regardless, the effect persisted, leading Twitter to speak officially on the issue. A spokesperson for the company stated “Our team did test for bias before shipping the model and did not find evidence of racial or gender bias in our testing. But it’s clear from these examples that we’ve got more analysis to do. We’ll continue to share what we learn, what actions we take, and will open source our analysis so others can review and replicate.”
There’s little evidence to suggest that such a bias was purposely coded into the cropping algorithm; certainly, Twitter doesn’t publically mention any such intent in their blog post on the technology back in 2018. Regardless of this fact, the problem does exist, with negative consequences for those impacted. While a simple image crop may not sound like much, it has the effect of reducing the visibility of affected people and excluding them from online spaces. The problem has been highlighted before, too. In this set of images of a group of AI commentators from January of 2019, the Twitter image crop focused on men’s faces, and women’s chests. The dual standard is particularly damaging in professional contexts, where women and people of color may find themselves seemingly objectified, or cut out entirely, thanks to the machinations of a mysterious algorithm.
Former employees, like [Ferenc Huszár], have also spoken on the issue — particularly about the testing process the product went through prior to launch. It suggests that testing was done to explore this issue, with regards to bias on race and gender. Similarly, [Zehan Wang], currently an engineering lead for Twitter, has stated that these issues were investigated as far back as 2017 without any major bias found.
It’s a difficult problem to parse, as the algorithm is, for all intents and purposes, a black box. Twitter users are obviously unable to observe the source code that governs the algorithm’s behaviour, and thus testing on the live site is the only viable way for anyone outside of the company to research the issue. Much of this has been done ad-hoc, with selection bias likely playing a role. Those looking for a problem will be sure to find one, and more likely to ignore evidence that counters this assumption.
Efforts are being made to investigate the issue more scientifically, using many studio-shot sample images to attempt to find a bias. However, even these efforts have come under criticism – namely, that using an source image set designed for machine learning and shot in perfect studio lighting against a white background is not realistically representative of real images that users post to Twitter.
Twitter’s algorithm isn’t the first technology to be accused of racial bias; from soap dispensers to healthcare, these problems have been seen before. Fundamentally though, if Twitter is to solve the problem to anyone’s satisfaction, more work is needed. A much wider battery of tests, featuring a broad sampling of real-world images, needs to be undertaken, and the methodology and results shared with the public. Anything less than this, and it’s unlikely that Twitter will be able to convince the wider userbase that its software isn’t failing minorities. Given that there are gains to be made in understanding machine learning systems, we expect research will continue at a rapid pace to solve the issue.