Checking AI bias is a job for the humans

One of the primary problems with artificial intelligence (AI) is the “artificial” part. The other is the “intelligence.” While we like to pretend that we’re setting robotic intelligences free from our human biases and other shortcomings, in reality we often transfer our failings into the AI, one dataset at a time.

Hannah Davis, a data scientist, calls this out, arguing that “a dataset is a worldview,” filled with subjective meanings. But rather than leave our AI hopes moribund, she also offers some ways we might improve the data that informs our AI.

AI has always been about people

It has become de rigueur to posture how very “data driven” we are, and nowhere more so than in AI, which is completely dependent on data to be of use. One of the wonders of machine learning algorithms, for example, is how fast they can sift through mountains of data to uncover patterns and respond accordingly. Such models, however, must be trained, which is why data scientists tend to congregate around established, high-quality datasets.

Unfortunately, those datasets aren’t neutral, as Davis points out:

[A] dataset is a worldview. It encompasses the worldview of the people who scrape and collect the data, whether they’re researchers, artists, or companies. It encompasses the worldview of the labelers, whether they labeled the data manually, unknowingly, or through a third-party service like Mechanical Turk, which comes with its own demographic biases. It encompasses the worldview of the inherent taxonomies created by the organizers, which in many cases are corporations whose motives are directly incompatible with a high quality of life.

See the problem? Machine learning models are only as smart as the datasets that feed them, and those datasets are limited by the people shaping them. This could lead, as one Guardian editorial laments, to machines making our same mistakes, just more quickly: “The promise of AI is that it will imbue machines with the ability to spot patterns from data, and make decisions faster and better than humans do. What happens if they make worse decisions faster?”

Complicating matters further, our own errors and biases are, in turn, shaped by machine learning models. As Manjunath Bhat has written, “People consume facts in the form of data. However, data can be mutated, transformed, and altered—all in the name of making it easy to consume. We have no option but to live within the confines of a highly contextualized view of the world.” We’re not seeing data clearly, in other words. Our biases shape the models we feed into machine learning models that, in turn, shape the data available for us to consume and interpret.

Copyright © 2020 IDG Communications, Inc.

Source link