Here’s some nice news. Kalev Leetaru has been liberating a ton of public domain images from books and putting them all on Flickr. He’s been going through Internet Archive scans of old, public domain books, isolating the images, and turning them into individual images. Because, while the books and images are all public domain, very few of the images have been separated from the books and released in a digital format.
To achieve his goal, Mr Leetaru wrote his own software to work around the way the books had originally been digitised.
The Internet Archive had used an optical character recognition (OCR) program to analyse each of its 600 million scanned pages in order to convert the image of each word into searchable text.
As part of the process, the software recognised which parts of a page were pictures in order to discard them.
Mr Leetaru’s code used this information to go back to the original scans, extract the regions the OCR program had ignored, and then save each one as a separate file in the Jpeg picture format.
Already over 2.6 million images have been posted to Flickr in this manner — all completely in the public domain. From a historical perspective, the images are fascinating — and the fact that anyone can do anything with them, free of charge, is important culturally as well. Just scrolling through the images is amazing. Here are a few interesting ones that I spotted:
There seem to be lots of images of musical scores, sewing machines, individual portraits, building and machinery. Each Flickr page associated with the image gives information about the book, including the text before and after the image, which is pretty cool. The one (only slightly) annoying thing is that on the Flickr pages, rather than saying these are public domain images, it says that there are “no known copyright restrictions.” While that’s accurate, and a potentially reasonable hedge against some miraculous finding that says these images are covered by copyright, it’s really too bad that it’s so problematic to come out and say “this is in the public domain, do whatever the hell you want with it.”