We live in a world of Big Data. Our lives are now being mapped in countless ways; sensors on our phones, our cars, and our Web browsers record the people with whom we interact, the places we go, and even what we’re thinking about.
With Big Data comes the promise of equally big knowledge—or so we’re told by scientists and companies. Excited at the prospects, Massachusetts has even launched a “Big Data Initiative” to ensure that the state wasn’t left out of the coming boom. The dream is that these streams of data are bound to unlock all kinds of insights about the world.
In many ways, this promise seems likely to be borne out: The data that are being generated are rich, detailed, and eminently crunchable. But it’s not quite correct to think of these datasets as all-encompassing. The advent of Big Data also has a paradoxical risk: that by sending us down the narrow paths of the data we have available, it may cause us to mistake those paths for the whole world.
This might sound like a minor concern, but it’s actually a recurring problem with human knowledge, with how science works. Throughout history, in one field after another, science has made huge progress in precisely the areas where we can measure things—and lagged where we can’t.
The result, over time, has been that we know a lot about the things that are closer to our size, our altitude, our spot in the universe—and less about things that are hard to reach, hard to dig up, and hard to quantify. What we know has a bias, in other words, and is biased in favor of what we can measure.
Of course, there are ways to remedy this: Explore and venture into the unknown, or otherwise stretch our ability to measure everything, not just what’s immediately around us. But until we achieve this nearly impossible task, scientists recognize that the huge gaps in our knowledge will remain a problem, or at least something for which we must make allowances. When studying a population, there is even an established term for this problem: the convenience sample. It’s not necessarily a good or representative sample, but it sure is convenient. And this problem has shaped large swaths of what we know—even on very familiar topics—in ways many people don’t appreciate.
The bias toward measurable information affects some of the most popular arenas of science as well as obscure ones. In paleontology, the big-name dinosaurs are the tyrannosaurus rex, the stegosaurus, and the triceratops. Admittedly, as any 5-year-old can tell you, they’re pretty cool, as dinosaurs go. And it’s tempting to think they’re so well known because they were the most important dinosaurs, or the most successful.
But it turns out that the story is a bit more complex. When paleontologists in the United States began to hunt for dinosaurs, back in the late 19th century and early 20th century, they focused their work on two dig sites: Hell Creek in Montana and Como Bluff in Wyoming. These sites were chosen, among other reasons, for their sheer abundance of fossils and the ease with which they could be worked. And what were the dinosaurs that were easiest to find in these two places? The tyrannosaurus rex, the stegosaurus, and the triceratops.
These dinosaurs have since taken on an outsize influence in popular culture, but not because they are necessarily the most important creatures of the Mesozoic; they were simply the most available. Since then, we’ve found what appear to be larger, and frankly, more frightening dinosaurs, from spinosaurus to giganotosaurus. While we have many complete tyrannosaurus rex skeletons, which can provide important insights into dinosaurs, the tyrannosaurus only lived in a single region for only about 2 million years, a little more than 1 percent of the length of the Mesozoic Era.
The low-hanging fruit, in other words, are what we know the most about. In biology, the term is taxonomic bias: when certain species or groups of organisms are studied more than you would expect based on, say, their frequency. For example, insects are overwhelmingly more populous than vertebrates, but many more research papers have been published on vertebrates. Vertebrates, being much easier to spot—and perhaps because they’re more like us, or at least more familiar—have garnered more attention.
In some cases, where this bias shades into something more sinister, what you might call species-ism, scientists have come up with an even harsher term: taxonomic chauvinism. For example, reptiles and amphibians get short shrift when it comes to research attention compared to birds and mammals, because they’re slimy, creepy, and generally less popular.Continued...