This posting outlines a possible workflow for getting digitized versions of Notre Dame’s Catholic pamphlets into the “Catholic Portal”.
The problem
The University of Notre Dame owns a significant number of Catholic pamphlets. These materials have been cataloged and denoted as destined for the “Portal” in their MARC records with the letters “CRRA” in field 590$u.
The University’s library wants to digitize these materials, make the resulting PDF files freely available on the Web, apply optical character recognition against the PDF files, and support a text mining interface against the result. Bits and pieces of this work have already been done. The problem is gluing them together into functional workflow.