|
Contract All | Expand All
 SELECTION
Sometimes, it's not about what you want to digitize, but what you have on microfilm that decides for you. KY-NDNP is in a unique situation since we have some 30,000 reels of master negative microfilm at our disposal. We also own the rights to those reels and their content, which can be pivotal in the selection process (SEE: On The Horizon for one such example). Good fortune aside, we're still bound by a few NDNP rules for selection:
- Be aware of content within a particular date range
-
- The Preservation Reformatting Center (PRC) has an excellent microfilm collection database. KY-NDNP can easily identify title reels in any date range. Other micropublishers or holders of microfilm likely have a similar database.
- NOTE: We've heard of an instance when a holdings list suggested the micropublisher had X number of titles, yet their internal database listed far fewer. What happened? The micropublisher spliced all of the small, 100' reels together to make 1000' reels then listed only the first title on the 1000' reel.
- Do your homework and be diligent with what you know!
- Choose as complete of a run (collection) of a paper as possible (orphan titles not withstanding)
-
- This is very important when it comes to meeting the required 100,000 pages. You could choose papers with few issues but you'll have to choose more titles in order to meet that page count goal. More to the point: each unique title requires an essay, and that can be a full-time job in itself. Plan accordingly.
- New NDNP Guidelines allow for a single essay to cover titles that change in name only. Nevertheless, title essays can multiply easily.
- Find the paper of record for the community/region it serves
-
- You may have to decide which community/region gets represented first. For instance, in Kentucky, the majority of newspapers come from small county communities though we have a fair amount of urban dailies, too. Your state's disposition could decide the route you take to start your digitization program.
- Some pre-standards microfilm, i.e. pre-USNP, cannot sustain quality digital imaging
-
- KY-NDNP is at a real advantage with this because we have an in-house microfilm and duplication facility with the expertise to manipulate poor quality master negatives into print master negatives perfectly suitable for scanning.
- For those without in-house facilities, there's reason for hope. Using KY-NDNP "pre-standards" film as a guide:
-
- 99% of our "pre-standards" microfilm is more than adequate for digitization
- The film might lack technical targets to measure resolution, but it's focused and perfectly legible
- Reduction ratios for IA and IIB film are usually 20x or less which means you can achieve at least 300dpi scans
- Lighting is often varied throughout a reel, but it is usually well within acceptable density ranges
- Though densities may vary from reel to reel, or even within a reel, imperfect densities can still produce legible digital scans. A good duplicating technician can usually improve under-exposed or over-exposed master negative quality.
 INFRASTRUCTURE
The first thing that should be said about infrastructure is this: whether you're doing the work in-house or outsourcing, you need ample infrastructure. One of the most important things we can tell you is this: calculate what you need, then double it! It doesn't matter what it is, rest assured you'll need twice as much of everything when it's all said and done.
The Preservation Reformatting Center's (PRC) facilities provide KY-NDNP access to microfilm readers, densitometers, and microscopes necessary for the inspection of master negatives. PRC also provides the duplication and processing equipment, as well as the expertise, necessary to create the highest quality second-generation negatives for digitization.
If you're in the market for microfilm evaluation gear, PRC recommends an x-rite 361T densitometer and Peak Shop Micro (1DIV1)= 0.0005 microscope.
- Did you know that a technician must take density readings from master negatives to properly duplicate a print master negative? How many readings depend on the technician, but some duplicating companies charge extraordinary amounts to provide density readings to you even though it's a natural part of the process.
If you don't want to invest in microfilm equipment and the expertise to use it, you can request the company who duplicated your negatives to also provide density readings from the print master negative. Be warned that they will charge handsomely for the service and it might actually be cheaper to do it yourself, if you are inclined to learn.
KY-NDNP uses a NextScan Eclipse 300 microfilm scanner. It is capable of creating TIFF 6.0 master images as required by NDNP specifications as well as uninterpolated images up to 400dpi.
Images from the NextScan are transferred to a dedicated Dell Linux server and a Stonefly storage area network. Today, this system acts as our production server but, at one time, it served as both our production and storage server. Back then, it was only six terabytes (6TB) and filled very quickly, often bringing production to a halt. Capacity on the production server has increased to twenty terabytes (20TB) while a new sixteen terabyte (16TB) Dell Linux server with a Dell storage area network, located in another building on campus, has become home to data storage.
The production server is connected via a one-gigabit network to twenty-six (26) Dell Intel Dual Core desktop machines in the Digital Lab. These desktop machines are dedicated to both manual and automated processing. Their processors are 6600@2.4GHz with 2G of RAM and run Windows XP Pro with Service Pack 3. This is the same configuration of the machines that run the Library of Congress' Digital Viewer and Validator (DVV).
Two Dell Intel Dual Core desktop machines are dedicated to optical character recognition (OCR) generation, with several licensed OCR engines installed on each machine; Abbyy Fine Reader being one of them. Processing the raw scanned images into valid deliverables are managed with the iArchives applications framework and database manager.
To be clear, it is by chance that KY-NDNP operates largely on Dell machines. We do not endorse Dell nor are Dell machines required for any aspect of the newspaper digitization program.
KY-NDNP uses two storage/back-up solutions. As previously mentioned, a sixteen terabyte (16TB) Dell Linux server with a Dell storage area network is located in a building apart from the Digital Lab. This server does no production work and is purely for storage. As a back-up, the raw, unprocessed images plus their deliverables and iArchives processing data are copied to an on-campus, free robotic tape storage facility. The University offers this service for the needs of campus-wide projects, like KY-NDNP, that require sophisticated, secure mass storage.
University of Kentucky Libraries, under contract with the Kentucky Virtual Library (KYVL) and the Kentucky Council on Post-Secondary Education (CPE), has developed, managed and coordinated the Kentuckiana Digital Library (KDL) program since 1999. This program provides digital access to resources that document the history and heritage of Kentucky. To date, the digital library program has representation from 16 Kentucky archives and provides access to over 5,000 EAD finding aids, 80,000 photographs, hundreds of oral history transcripts and streaming audio files and some 450,000 book, journal, newspaper and manuscript page images. The KDL provides KY-NDNP the opportunity to make Kentucky's historic newspapers from NDNP available online as well as those titles that are currently barred from Chronicling America for one reason or another. (SEE: Kentucky Edition Newspapers) KDL runs on the DLXS Platform developed by the University of Michigan.
 MICROFILM EVALUATION
Microfilm evaluation is one of the most critical steps for the historic newspaper microfilm-to-digital process. It can disqualify a reel for digitization or it can prevent countless mistakes in later processing. There are two key components in evaluating microfilm:
- technical inspection of the physical reel (after title/reel selections have been made)
- intellectual analysis of what is on the reel (after the print master duplicates have been processed)
The technical film inspection starts with the master negative reel by taking resolution and density readings, inspecting the film for defects, and replacing inferior splices.
Resolution readings are only possible when a resolution technical target is included on the film. But much of the older "pre-standards" film do not include this technical target and, without it, resolution can only be measured against the strength of one's eye; it's either in focus or not, with varying degrees in between. Nothing can be done to improve resolution of a piece of microfilm. In fact, by nature the film will lose resolution with each successive generation of film made from it. In other words, if the resolution is bad on the master negative, it will be slightly worse on the print master negative or positive.
Certainly, a reel that doesn't resolve well enough can be eliminated from digitization. The same can be said for a reel with imperfect density readings. Why does the density matter? It matters because a reel with good density will produce better, crisper, readable digital images. (SEE: preceding section on Infrastructure for more) Should you throw out a reel, or a whol;e title, because of bad density readings? Consider first the ability to manipulate density during the duplication process. There are three ways to do it:
- The duplicating technician can change the lamp and speed settings of a duplicating machine to increase or decrease exposure from the master negative onto the print master
- The type of microfilm can play an important role in improving the print master densities by choosing high-contrast or low-contrast microfilm stock
- Where you take the density readings throughout the reel can have a great effect on the "average". The average decides the duplicator settings and that can make all the difference in good print masters for scanning. It has been said that a technician can find any density reading on any piece of film if they're willing to look for it. It's true.
The Preservation Reformatting Center (PRC) replaces inferior splices because they can fail during the duplication process and rip the film, and because they create a sort of "speed bump" on the master negative. The image on the print master just after this "speed bump" will be noticeably blurred and, many times, illegible. Replacing older glue, weld, or ultrasonic splices with flatter, more secure, less caustic tape splices allow problem-free duplication and, ultimately, improve the preservation of the master negative, not to mention saving the legibility of countless digital page images.
The basics of intellectual evaluation of a reel of microfilm is rather simple. Reviewing a reel of film through a microfilm reader, the newspaper pages and their order are recorded. Noted, too, are any anomalies like severely mutilated pages. This intellectual evaluation step is important for two reasons:
- to allow the scanning technician to know that the microfilm scanner has captured everything present on the microfilm reel
- to allow the metadata technician to confirm that missing issues, pages, duplicate issues or pages, and other anomalies are consistent with what is on the microfilm
Perhaps the best way to tell you what KY-NDNP tracks is to show you our Microfilm Evaluation Form. This form is a MySQL database using an XHTML interface powered by PHP. It was first developed during the NDNP Test Bed (Phase One) but has since been expanded to include other facets of our work in Preservation and Digital Programs, like conservation treatments and microfilm target generation. Through secure, password protected accounts, the evaluation form will soon be released to other NDNP awardees. Stay tuned for that release date!
In the meantime, what KY-NDNP harvests on our evaluation form may be somewhat different than what you want. Some of the information we collect is in anticipation of future needs, for instance tagging mutilated pages so they can be easily identified if other copies turn up. But every NDNP operation is different and you may not have the time to spend on such minutia. By the same token, you may have more time to spend on the evaluation process and may feel more information needs to be harvested. It's a choice you have to make. That said, we've found that there is a point of diminishing returns.
The final point to be made about evaluating microfilm is understanding how a newspaper is filmed. Microfilmer's work is as individual as they are. Though the modern guidelines have been "under construction" for nearly 50 years, there are patterns - in the paper itself, the filmers, and the binders - along the way and they can be used to one's advantage for evaluation.
 METADATA
Metadata can be the scariest part of the NDNP process but it doesn't have to be. It's just a matter of getting used to the rules and looking at what can feel like endless lines of text and tags. "Don't Panic!" It all boils down to data about data.
NDNP follows the Library of Congress METS (Metadata Encoding and Transmission Standard) schema. The METS schema is a standard for encoding descriptive, administrative, and structural metadata regarding objects within a digital library, expressed using the XML schema language of the World Wide Web Consortium... The following are examples of KY-NDNP data a la METS. Samples from vendors may look slightly different but the basic premise is the same.
For KY-NDNP, metadata collection starts with the Microfilm Evaluation. It is here that we gather, in one place, all of the information we will need about the title, the reel, and the newspaper (descriptive, administrative, and some structural). This metadata is then exported into our workflow system. During post process the metadata is distributed to the necessary image headers and XML components. If you're outsourcing, your vendor will ask that you enter this same information into a spreadsheet (probably Excel) that they can then import into their system to produce the same deliverables.
Metadata can be found in a variety of places, not just from the evaluation of the newspaper. It can come from the MARC record, microfilm box, targets on the film, on the film itself (such as the film stock and manufacturer). We've even managed to find metadata from decades old filming log books. For example, "date filmed" 6/65 may be transposed into the reel metadata as:
< ndnp:dateMicrofilmCreated >June 1965< /ndnp:dateMicrofilmCreated >
Not all metadata we collect is used in our NDNP deliverables. We collect some information purely for our own purposes, such as mutilated pages. Other information is used for our KDL deliverables, like "publisher". The bottom line is that collecting too much can stop production. Not collecting enough, and you may lose important provenance information.
Perhaps the best way to understand what XML means in the NDNP context is to look at it by category (links below). The advantages of XML are that it's usually very intuitive - < title>Kentucke Gazette< /title>; validators catch many errors (such is the case with the Library of Congress DVV); and there are a lot of tools built with it, such as EAD, TEI, or XHTML.
 WORKFLOW
The KY-NDNP in-house production workflow looks slightly different than most other NDNP awardee workflows. There are four positions in the KY-NDNP workflow (not including the Principal Investigator): Program Manager, Office Manager, Image Management Specialist (IMS), and Student Workers (temporary/part time). Many ask specifically about the Program Manager position and all that entails. We've assembled a more extensive look at KY-NDNP staffing for those interested.
- Choosing the Titles and Microfilm
The KY-NDNP program team along with the Advisory Board chooses the titles to be digitized. (SEE: Selection) Then, we pull from our vault of nearly 30,000 reels the microfilm to be duplicated for scanning. (SEE Infrastructure) The technical evaluation of the microfilm takes place prior to the duplication. The print master (2N) densities are, of course, assessed once the 2N is made. (SEE: Microfilm Evaluation)
- Evaluating and Scanning Microfilm
The intellectual evaluation of the newspaper on the microfilm takes considerably more time than the technical inspection but it's important to do before scanning. The Image Management Specialist (IMS) doesn't have the luxury of working with film made under modern standard guidelines. For instance, evenly spaced images or pages straight on the camera beds, consistent targets or "Second Intentional Exposure" targets aren't normal for historic newspaper microfilm. Plus, there are all manner of foreign objects in some microfilms which can trip-up a scanner's detection ability. Because of these issues, scanning historic newspapers can be slow going. For those outsourcing this step in the workflow, be aware that the quality of the filming can impact scanning costs.
- Ingest, Crop, Deskew, and Split
After a reel is scanned, the images are ingested into the workflow management system. Many of the steps within this system are automated though they may need human intervention to initiate. NOTE: How the images are processed in our system is not radically different from other vendor systems. The names may change, the parameters may change, but systems for newspaper digitization are overall very similar. Following a series of set automation parameters, during the ingest process each digital page image is cropped and deskewed. These parameters allow cropping to the text block or outside the page, for example. Once the system has generated its best "guess", the KY-NDNP student assistants adjust the crop and deskew of each page image as necessary. 2 up images (two pages filmed side by side in the same exposure) are also split so each page becomes a digital object.
When the images have passed the Quality Assurance step, they're ready to have the metadata assigned. This is done in a "two-key" action and one reconcile step. Here, page numbers, editions and sections are identified and issue dates applied. This metadata is imparted to all corresponding files during post and image post processes. Student assistants handle the first and second key steps while the Program Manager reconciles any differences to ensure the final product matches the data that was identified in the Microfilm Evaluation.
The next step is to build the columnar boxes, or "zones" for read order. This step, too, is performed by the students. In the not-too-distant future this step will become fully automated with incredible accuracy. When that happens, the amount of labor needed will drop considerably. Until then, students manually "draw" the zoning coordinates for each column. Once this step is done the OCR can be generated, which is itself an automated feature.
- Final QA and Metadata Import
The final Quality Assurance step looks at all facets of the digital page images; crop, deskew, split, metadata, and column zone boxes. Once any outstanding issues have been resolved, it's simply a matter of inserting the metadata gathered in the microfilm evaluation form and pressing "go".
The images go through back to back automated processes for what will become data ready to be assembled into a batch for delivery. The derivative files are generated (PDF, JP2 - JPEG2000, XML) with the metadata properly assigned to each, and sorted into a neat issue and reel directory structure.
Every step of the KY-NDNP program is documented using our internal wiki. Not enough can be said for the organizational relief our wiki offers. We provide training modules for student workers, list project milestones, and track a reel or title through every step of the microfilm-to-digital process.
 DELIVERABLES
All NDNP batches are delivered according to a predefined order.
- Each batch is assigned a uniform name (of sorts) such as batch_ky_20060803_nirvana; where ky signifies the awardee; 20060803 is the date of batch validation; and nirvana (N) is the fourteenth batch of the respective program phase
- Each LCCN has its own directory and...
- Inside each LCCN directory are the reels - or titles from a reel - associated with the LCCN. The reels are identified by the bar code that is provided by the Library of Congress and assigned by the KY-NDNP Program Manager
- The titles are then broken down into issue containers/directories with the reel's targets placed within the reel container itself
- a typical directory tree
We send our batch data to the Library of Congress on external hard drives. They keep those drives for up to three months. In the meantime, we continue to produce data for delivery so, we have need for more drives. We have accumulated a lot of hard drives over the years; 73 and counting! To keep up with them, and to have a bit of fun, they're each named after a race horse (this is Kentucky after all Horse Capital of the World). We learned very early that some brands couldn't withstand the rigors of travel or constant access. Today, we use only Western Digital drives - 500GB, 1TB, and 2 TB - from the MyBook Studio and Mirror series. Some drives are raided and some are not. We've had very good success with these drives.
It's not just the drives we have to keep up with but the batches, too. Just like the external hard drives, we come up with a naming scheme for each batch and a theme for those names with each Phase of the program. The name we give a batch stays with it for its lifetime. The data, then, can be traced back to a batch and the drive it was delivered on so, it's very simple to stay organized. The KY-NDNP Phase I batches were named for musicians we liked (or found...interesting). We delivered 110,000+ page images in 14 batches, A-N or Abba - Nirvana. Phase II batches are named for one word movies; final count forthcoming. Have a look at our funny lists if you dare!
Again, we keep good documentation of every aspect of the KY-NDNP program with our internal wiki.
Wondering how to calculate line pairs, capture resolution, or True DPI? How about determining reel generation or estimate pages per reel? The KY-NDNP Quick Guide can help!
Want to know more about KY-NDNP extracurricular work? Interested in more about NDNP?
But wait, there's more...from Microform Imaging and Review's Spring 2009 release comes The Digitization of Historic Newspapers: The Kentucky Experience.
HEY - we've produced a short film that describes the University of Kentucky Libraries' work with newspapers through the years including our in-house digitization process. You can view the film at YouTube or download a hi res version.
|