Boston.com THIS STORY HAS BEEN FORMATTED FOR EASY PRINTING

Crunching the metadata

What Google Print really tells us about the future of books

IN RECENT MONTHS, we've heard that Google is digitizing the libraries of several major universities and making the text searchable through its Google Print search engine-bringing cries of copyright infringement from publishers and author groups. Meanwhile, Microsoft says it will provide online access to 100,000 books in the British Library, and Amazon, which already sells digital versions of books, will soon sell individual chapters, too. But despite the present focus on who owns the digitized content of books, the more critical battle for readers will be over how we manage the information about that content-information that's known technically as metadata.

We've been managing book metadata basically the same way since Callimachus cataloged the 400,000 scrolls in the Alexandrian Library at the turn of the third century BC. Callimachus listed the library's contents on scrolls, Medieval librarians used ledgers, and we use card catalogs, now mostly electronic. But until information started moving online, the basic strategy has been the same: Arrange the books one way on the shelves, physically separate the metadata from them, and arrange the metadata in convenient ways.

This technique works so well for organizing physical books that we've long overlooked its basic limitation: Because books and their metadata have, until recently, been physical objects, we've had to pick one and only one way to order them in defined, stable ways. When Melvil Dewey introduced the Dewey decimal classification system in 1876, it was an advance because it shelved books by topic, making the library's floor plan into a browsable representation of the order of knowledge itself. But no one classification can represent everyone's way of organizing the world. You may file a field guide to the birds under natural history, while someone else files it under great examples of the illustrative art and I file it under good eating.

The digital world makes it possible for the first time to escape this limitation. Publishers, libraries, even readers can potentially create as many classification schemes as we want. But to do this, we'll need two things.

First, we'll need what are known as unique identifiers-such as the call letters stamped on the spines of library books. Unfortunately, any system that assigns numbers to books based on what they're about is going to suffer from Dewey's weakness. Both Google Print and Amazon use the International Standard Book Number, or ISBN. Created in the 1960s, the ISBN is a good starting point; still, it may not be the ultimate solution. Only books published since the '60s have them, for one thing, and while a real-world library has to deal with each book as an inviolable whole-just try checking out a single chapter of a book-in the near future we'll need identifiers for individual chapters, paragraphs, even illustrations. ISBNs only identify the book as a whole.

Second, we're going to need massive collections of metadata about each book. Some of this metadata will come from the publishers. But much of it will come from users who write reviews, add comments and annotations to the digital text, and draw connections between, for example, chapters in two different books.

As the digital revolution continues, and as we generate more and more ways of organizing and linking books-integrating information from publishers, libraries and, most radically, other readers-all this metadata will not only let us find books, it will provide the context within which we read them.

The real challenge to traditional publishing today comes not from the digitizing of books, then, but from the very nature of the Web itself. Using metadata to assemble ideas and content from multiple sources, online readers become not passive recipients of bound ideas but active librarians, reviewers, anthologists, editors, commentators, even (re)publishers. Perhaps that's what truly scares publishers and authors about Google Print.

David Weinberger is a writer and fellow at the Harvard Berkman Center for Internet and Society. 

© Copyright 2006 The New York Times Company