But almost 30 years and 621,000 scanned images later, DjVu has become much less usable as Internet technology has advanced. "It reached a point where people couldn’t open the files on a Mac, or on mobile devices, which people are using more all the time," Curl says, "and most browsers today won’t work with DjVu."
Because the scanned oil and gas records (located at http://goo.gl/YAfnOF) make up the most popular data service on the KGS website, a new way to serve the files online was needed. Curl and his section staff began discussing a way to change the file formats more than a year ago. The successful process they developed taught them valuable lessons in managing large data sets that may be useful for other surveys and organizations with similar collections.
The original paper records were provided by the Kentucky Division of Oil and Gas, and KGS staff and student workers scanned them first into TIF files (tagged image format) before they were converted to DjVu. "But we always kept an archive of those raw TIF’s in case those DjVu’s failed and we needed to redo them," says Liz Adams, who oversees scanning and storing the drilling records. "And the early choice to keep that archive turned out to be an important one."The challenge was finding a way to mass-convert those original TIF’s into smaller, more compressed contemporary file types. The section staff chose Adobe’s portable document format (PDF) for the oil and gas records, and the common JPG image format for the e-logs.
Carrie Pulliam, who also works in the section, found a simple, free file conversion program called ImageMagick. Using ImageMagick and writing additional code, the staff spent a couple of months developing and testing an automated process to query the image database and convert the images to the appropriate file format. Curl says another decision made when the DjVu system was first used turned out to be crucial to developing scripts for the conversion: a "scanlist" table was created in the database that kept track of metadata about the scanned oil and gas documents.
The conversion of the 621,750 scanned images in the database began during the Christmas break last year, taking about three weeks of continuous automated work on a KGS server. Curl monitored it remotely from home, catching a few errors and hang-ups that could have crashed the process. "The newly converted files were stored on a whole separate system, and once the conversion was complete, and we determined that we were happy with the results, Doug changed our web service, so the users were being directed to the new image formats," Adams says.
"The service is much more reliable now," Curl says. "You can open the PDF’s in any browser and in mobile devices, which users really wanted out in the field or wherever they needed them. They do have to go to another page now and open a JPG of the e-logs, but now they can open it in their browser and quickly view it. It works really well."
As an added benefit of the conversion project, the section staff better understand the workings of the huge database and can make file replacements and other maintenance more quickly. They’re also considering developing a presentation to help other surveys and organizations. "We’ve had calls from other surveys in the past about how we’re managing and disseminating our online records," Adams says. "Some of them have huge libraries of paper documents, and they’re wondering what to do with the files if they scan them."
"We’re certainly willing to help someone who has the same issue," Curl says.