Press "Enter" to skip to content

Artificial intelligence needs a challenge to biased-uses

Cas Sweeney ‘19
Associate Editor

With ever-increasing technological advances, new concerns about privacy and data-mining come up every day. We as a society are facing challenges previous generations would have never imagined.

However, new technology can also disguise problems we have faced before, reinterpreted and rebranded for the modern age. One such example of this phenomenon is research by Michal Kosinski and Yilun Wang into the relationship between sexuality and facial features using artificial intelligence (AI).

Kosinski claims that “faces contain much more information about sexual orientation than can be perceived and interpreted by the human brain” and attempts to use AI to analyze faces and classify them as either homosexual or heterosexual. The study said that the AI was accurate an average of 87 percent of the time when given multiple photos of a person.

Since the study was published, there have been many criticisms of both the methods used and the ethics of writing the program at all. Statisticians pointed out that since the study only included out, white, cisgender, straight and gay men and women, leaving out people of color, transgender, bisexual, closeted and other types of people, Kosinski’s methods would not be replicable and would be affected greatly by sample bias.

GLAAD and the Human Rights Campaign have already spoken out about the study, pointing out that such algorithms could be used to put gay people around the world at risk, whether the science was accurate or not. Other LGBT activism groups said that the study promotes a very limited and harmful understanding of gender and sexuality.

Very few articles point out the risky thinking that went into creating the study in the first place: the assumption that aspects of our society as they exist now are ultimate truths. As the fields of statistical and data sciences get more and more sophisticated, both statisticians and people reading new research forget that results are only ever as good as the data with which one starts.

Kosinksi’s algorithm treats the definitions of “straight” and “gay” as not only binary and rigid, but also universal and innate. He later made the claim that his algorithm could also be used to determine political views and, even more concerning, IQ.

The relationship between IQ, statistics and discrimination has a long and terrible history. In 1994, Herrnstein and Murray published the book, “The Bell Curve: Intelligence and Class Structure in American Life,” in which the authors claimed that there is a genetic relationship between race and intelligence. The theory was thoroughly debunked, both because of the racism of the methods and the fact that our ways of testing intelligence are extremely biased, especially when comparing people of different races.

Even before the modern methods of statistics, the related pseudosciences of craniometry, phrenology and physiognomy all claimed that the shape of people’s heads and faces could determine intelligence, character and even worth as human beings. The “science” behind this was used to justify even genocide, eugenics and slavery.

Over and over again, when the data collection methods statisticians use are biased, such as using questions with background information that is more familiar to white people than people of color or assuming a consistent definition of sexuality across all people, the conclusions aren’t questioned when they should be.

Kosinski has responded to the common criticism of his methods, saying that since the technology exists, he felt it necessary to warn the world it was out there. However, the concern about his methods is much less important than the assumption that his data is unbiased.

As time goes on, we will become more and more confident in our statistical and data collection methods and it will be harder to remember that we must think critically about how the data used can be influenced by our own assumptions. Whenever we see potentially biased studies, it is necessary to challenge the conclusion not only based on the methods, but the data itself.