boston.com Business your connection to The Boston Globe
@LARGE

Saving the world as we know it

SAN FRANCISCO -- One of the great wonders of the modern world is being constructed here, on a former military base called the Presidio, making use of computer-savvy volunteers instead of rock-lugging slaves.

The Internet Archive has the ambitious goal of offering ''universal access to human knowledge," and, in pursuit of that, in a small white wooden building that once served the base as a general store, the archivists are collecting every sort of digital file imaginable, from Web pages to podcasts, software programs to movies, presidential phone conversations to recordings of Cowboy Junkies concerts.

Brewster Kahle is the MIT-educated former entrepreneur who began building the library in 1996, for the simple reason that ''nobody else seemed to be doing it," he says. Now, he realizes that he has undertaken a task with no obvious stopping point. In 2001, he started recording 20 television channels, continuously, and recently he has had volunteers scanning thousands of out-of-print books. Each month, the Internet Archive collects the equivalent of one Library of Congress, says Kahle. The collection, available at www.archive.org, has already surpassed one petabyte. That's a million gigabytes.

Unfortunately, the archive's omnivorous hoarding puts it on ''incredibly treacherous legal ground," says John Palfrey, executive director of the Berkman Center for Internet and Society at Harvard. One lawsuit filed against the archive in July accuses it of providing access to an old copy of a company's website that the company no longer wanted to make available. And then there are the copyright issues.

''Clearly, the archive is a public service that is doing a public good," says Palfrey, an attorney. ''We want to have a historical record of what happened on the Web. Unfortunately, it is directly at odds with US copyright law, and copyright law in most countries," since the archive collects and stores copyrighted images and texts without obtaining explicit permission from their owners. (There is a way to prevent the Internet Archive from grabbing pages from a website, which the Globe will do with this article so that it can later charge money for access to it.)

Kahle shows up a bit late for an interview in the Internet Archive's book-lined conference room. One of the books on the built-in shelves is ''The Vanished Library," a history of the Library of Alexandria in Egypt, which was destroyed by fire. Kahle, a bespectacled blend of entrepreneur and academic, says the book inspired him to start the Internet Archive. On his business card is the title ''Digital Librarian." On his feet are wool socks, no shoes.

While studying at MIT in the 1970s, Kahle says, there were two big ideas in the air. ''One idea was encryption," he says. ''The other was to build a digital library so people could have the Library of Congress on their desktops."

After graduating, Kahle chose to follow an entrepreneurial path. He was present at the creation of Thinking Machines, the Cambridge-based supercomputer company, and later started WAIS, a company that helped publishers put information on the Web and make it searchable. WAIS was acquired by America Online, and Kahle's next company, a search and ranking service called Alexa Internet, was bought by Amazon.com. Kahle used the money from those two transactions to start and fund the Internet Archive, which is a nonprofit.

The Internet Archive brings in some revenue by consulting with libraries on how they can build their own digital repositories; last year, it worked with the Library of Congress to collect online information related to the 2004 presidential election. But much of Kahle's energy, and the Internet Archive's $5 million annual budget, is dedicated to preserving bits that might otherwise be lost.

''Libraries can spend money preserving things that publishers and other businesses can't typically afford to save, after their commercial heyday has passed," Kahle says.

For a generation more accustomed to typing a few keywords into a search box than visiting a bricks-and-mortar library, Kahle thinks its important to make everything instantly accessible online.

''At this point, kids look to the Net for answers," he says. ''If it's not there, it's as if it doesn't exist." But today, many of the great works, he says, aren't available online.

So Kahle is starting an initiative to scan out-of-print books, and make them available online. Of course, many books that are out of print are still protected by copyright, so Kahle has also filed a lawsuit against the United States to free those works. (The suit is currently pending appeal.) Google's working on a similar book-scanning initiative in partnership with several large libraries, but Kahle says that Google seems more interested in making the text searchable, rather than offering the full text online as the Internet Archive hopes to do.

''They're part of the ecology," Kahle says of Google, ''and we're very encouraging of their efforts."

The Internet Archive also sponsors a small fleet of Internet bookmobiles -- which operate in San Francisco, Egypt, India, and Uganda -- that allow people to find full-text books online and print out their own paperback copies. Kahle says the cost of lending a book out can approach $2 for some libraries; printing out a black-and-white copy on-demand can cost as little as 50 cents.

Kahle is working with other organizations to archive music, spoken word recordings, and videos. In the distant future, he says, ''Home movies will be very illustrative of what the culture is now, and what families are." So the Internet Archive supports sites like OurMedia.org, which offer free hosting and bandwidth for big user-produced media files, like home movies. Kahle says OurMedia has already collected about 25,000 items.

When the organization runs up against technical barriers that seem insurmountable, it chisels away at them. It couldn't find a storage device on the market that was capable of holding a petabyte of data inexpensively, and consuming little power. So the Internet Archive simply built one on its own, called the petabox. (You can build your own in the basement, since they made the design available as an open-source document.)

Kahle acknowledges the legal risks in assembling such a massive trove of digital works. But Kahle thinks he can work to change the law.

''People don't go into the library field because they want to make new law," he says. ''They do it because they love libraries."

Palfrey says the so-called ''fair use" doctrine, which allows for some noncommercial use of copyrighted works, may protect some of what the Internet Archive does.

''If anybody has a great set of fair use defenses, it would be the Internet Archive," he says. ''But it's by no means a slam dunk."

Still, Kahle and his colleagues are building the Internet Archive with an eye to the ages, always asking: What seems worth preserving?

On a tour of the offices, Kahle pulls a couple of 1990s-era CD-ROMs out of a closet. One is ''Mighty Morphin Power Rangers Create-a-Movie." Another is ESPN Interactive's ''Fly-Fishing School."

''You have to think about getting it off its old media, and getting it to run," says Kahle. The Internet Archive already sought and won an exemption from the Digital Millennium Copyright Act of 1998, which prevented the group from breaking the software's copy protection. It seemed, to Kahle, a problem worth solving.

Technologists are often accurately depicted as people more interested in the possible than the past. Brewster Kahle and his colleagues defy that depiction, using technology in clever ways to preserve our shared past.

Scott Kirsner is a contributing editor at Fast Company. He can be reached at kirsner@pobox.com.

SEARCH THE ARCHIVES
 
Today (free)
Yesterday (free)
Past 30 days
Last 12 months
 Advanced search / Historic Archives