THIS STORY HAS BEEN FORMATTED FOR EASY PRINTING
Globe Editorial

Data mining for gold

Email|Print|Single Page| Text size +
January 3, 2008

IN THE Internet age, it should only take a few clicks to see the grinding of federal government's gears. But though Web surfers can read old FBI files on Frank Sinatra, for example, they have to dig or pay to find other documents, such as the full body of federal case law. And those who want to see historic documents often have to go to the libraries where the documents are physically located.

Fortunately, these old-fashioned limits are giving way. Nonprofit organizations are pushing for transparency by putting more federal documents, databases, and video on the Web. It's rich information, but it could overwhelm users who aren't familiar with the depths of the Web. So once this data is released, creative individuals and organizations should find ways to make all this information easy to understand and use.

The Boston Public Library is already at work. Its digitized holdings include John Adams's book collection. And the library has a new $500,000 grant for a project to digitize 2.5 million pages of government documents, including its records of the House Committee on Un-American Activities hearings, and 50 years of other congressional hearings from 1936 to 1986.

With this growing Web-based public record, the BPL can better serve the country and the world. The library also plans to create a comment area to encourage a public conversation that could have global dimensions.

To scan all these pages, library staffers are using technology developed by the Internet Archive, a nonprofit that's building a digital library of websites and other cultural artifacts. The funding comes from the Kahle-Austin Foundation, created by Brewster Kahle, the cofounder of the Internet Archive, and from the Omidyar Network, an organization set up by eBay founder Pierre Omidyar to support innovative projects.

The BPL's project is the first part of a grand plan to raise $6 million to build a digital archive of 60 million pages of federal documents. This project was devised by Public.Resource.Org, a nonprofit that promotes access to federal data and plans to put 1.8 million pages of federal case law on the Web, including all Court of Appeals decisions from 1950 to the present and all Supreme Court decisions since the late 1700s.

As material is released, innovations should follow, says Carl Malamud, Public.Resource. Org's founder. He expects libraries, businesses, and individuals to create new tools such as websites to process the raw data.

Maura Marx agrees. The manager of BPL's digital services, she says that just as the library helps people learn English today, it will help people learn to navigate large digital collections tomorrow.

The Internet has sparked great commercial progress. Now it can bring federal records into the 21st century.

more stories like this

  • Email
  • Email
  • Print
  • Print
  • Single page
  • Single page
  • Reprints
  • Reprints
  • Share
  • Share
  • Comment
  • Comment
 
  • Share on DiggShare on Digg
  • Tag with Del.icio.us Save this article
  • powered by Del.icio.us
Your Name Your e-mail address (for return address purposes) E-mail address of recipients (separate multiple addresses with commas) Name and both e-mail fields are required.
Message (optional)
Disclaimer: Boston.com does not share this information or keep it permanently, as it is for the sole purpose of sending this one time e-mail.