Carl Malamud has this funny idea that public domain information ought to be… well, public. He has a history of creating public access databases on the net when the provider of the data has failed to do so or has licensed its data only to a private company that provides it only for pay. His technique is to build a high-profile demonstration project with the intent of getting the actual holder of the public domain information (usually a government agency) to take over the job.
Generally, Malamud is launching a project to scan ultrafiche of the Federal Reporter, cleanup the rather large (3+ gigabytes per image) images, then do OCR on the images. All this will be done using an assortment of open source and free tools. I’ll be keeping a close eye on this, not only to see how it progresses but to see what West’s reaction will be since they produced the ultrafiche he is using and are the publishers of the Federal Reporter.
There is a good NY Times article on this here.