Catholic pamphlets and the “Catholic Portal”

This posting outlines a possible workflow for getting digitized versions of Notre Dame’s Catholic pamphlets into the “Catholic Portal”.

The problem

The University of Notre Dame owns a significant number of Catholic pamphlets. These materials have been cataloged and denoted as destined for the “Portal” in their MARC records with the letters “CRRA” in field 590$u.

The University’s library wants to digitize these materials, make the resulting PDF files freely available on the Web, apply optical character recognition against the PDF files, and support a text mining interface against the result. Bits and pieces of this work have already been done. The problem is gluing them together into functional workflow.

Continue reading “Catholic pamphlets and the “Catholic Portal””

Text mining Catholic pamphlets

This is the quickest of blog postings outlining how I am initially providing a text mining interface to digitized Catholic pamphlets.

Jean McManus used a scanner to create PDF versions of a few Catholic pamphlets. Along the way, she also had the software to a bit of OCR. She then gave the PDF documents to me with filenames matching MARC 001 fields.

I saved these files to our local file system and used the venerable pdftotext application to extract the plain text. I then hacked my locally harvested MARC records describing the given pamphlets with two additional URLs. One pointing to the local PDF file. Another pointing to a rudimentary text mining interface. Finally, I reindexed the MARC records making the URLs visible. There were only three edited records, and you can see the fruits of these labors here:

There are many things wrong with the implementation. The text mining interface points to invalid catalog records because they are hard-coded for University of Toronto content. The titles of the content include MARC field 245$c, but the older text mining interface did not expect this. Consequently, the title information for these newly added records is invalid. The PDF documents were scanned two pages at a time. This probably causes the extracted text to span both pages and thus invalidate every sentence. We will need to scan only one page per image to circumvent this problem.

Despite these difficulties, it is possible now to do a bit of analysis against the pamphlet, but there are many avenues for improvement. “Software is never done.”

VUFind record drivers and templates

This posting documents how I wrote and edited a couple of VUFind record drivers and Smarty templates for the “Portal” of the Catholic Research Resources Alliance. In writing this posting I hope to support any developer coming behind me as well as inform the wider open source community on how VUFind works.

Continue reading “VUFind record drivers and templates”

CRRA Update October 2010

CRRA Update OCTOBER 2010

  • CRRA Welcomes Dominican University and University of San Francisco
  • Spotlight on Portal Development: Progress on EAD; VuFind Announces 2.0 Roadmap
  • CRRA January 6, 2011 in San Diego – Make your plans to join us!
  • Call for Proposals: IMLS National Leadership Grants

New Member Highlights

We are pleased to announce the addition of two new members. Following is brief information about our newest members, selected collections, and leadership. A warm welcome to Dominican University and University of San Francisco.

Rebecca Crown Library, Dominican University (River Forest, IL)

Archives and Special Collections, Crown Library

Dominican University‘s collections include materials related to the following: University publications, institutional records, student theses, faculty papers, architectural plans, foreign programs, student activities, student organizations, graduate programs, graduation ceremonies, marketing campaigns, lectures, and even a collection of civil war materials. There are approximately 12,000 photographs, 1,500 rare and fine books as well as manuscripts and various letters, ephemera and works of art. Finding aids and appropriate digital files from the collections will be added to the Catholic portal.

McGreal Center at Dominican University

In March, 2006, the DLC and Dominican University announced a new collaborative foundation: the Sr. Mary Nona McGreal Center for Dominican Historical Studies. Since 1989 Sister Nona has directed the work of Project OPUS. Sister Nona’s collaborative style of leadership and scholarship facilitated the acquisition of over 5,000 documents, resources and publications germane to the history of the Dominican Family in the United States. Finding aids from the McGreal Center will be added to the Catholic portal.

Bella Karr Gerlich, PhD is University Librarian at Dominican University, River Forest, IL. Her prior administrative appointments include: Associate University Librarian at Georgia College & State University and Head, Arts & Special Collections at Carnegie Mellon. Dr. Gerlich has a Bachelor of Fine Arts degree from Virginia Commonwealth University, a Masters in Public Management from Carnegie Mellon and a PhD in Library and Information Science from the University of Pittsburgh.

Gleeson Library, University of San Francisco

The Albert Sperisen Collection of Eric Gill consists of comprehensive holdings of Gill’s published works, in addition to over 400 wood engravings, two dozen wood-engraved blocks, original artwork, and manuscript holdings including a substantial series of correspondence between Gill and his student, Desmond Chute.  The collection is processed and a finding aid is available.

The Hans and Phoebe Barkan Collection of Robinson Jeffers includes a complete collection of Jeffers’ published works in addition to a significant manuscripts collection. The core of the manuscript holdings is the outgoing correspondence of Robinson and Una Jeffers, numbering several hundred letters. The collection is processed and a finding aid is available.The Thomas More Collection includes over 180 works by More, of which 95 are pre-1801 imprints. The Rare Book Room holds 68 of the titles identified in Gibson (New Haven: Yale University Press, 1961) including a first printing of Utopia (1516). Important related materials are the Rare Book Room’s holdings of works by Erasmus and St. John Fisher.

The Recusant Literature Collection includes over 600 works by and about Catholics in England during the period of Penal Laws, beginning with the accession of Elizabeth I in 1558 and continuing until the Catholic Relief Act of 1791, with a special emphasis on the Jesuit presence throughout these two centuries of religious and political conflict.

Tyrone H. Cannon has been Dean, University Libraries at the University of San Francisco since August 1995. He was Senior Associate University Librarian at Boston College prior to joining USF. He has held positions at Columbia University, Oklahoma State University, and the University of Texas at Arlington. Prior to becoming a librarian, Cannon was a clinical social worker.

Cannon has been an active member of the American Library Association and the Association of College and Research Libraries where he served as president in 2003-204. He is currently a member of the Library Board of California, the Executive Board of the Statewide California Electronic Library Consortium, and the Friends’ Board of the San Francisco Public Library. In June he was appointed to a three-year term on the Western Association of Schools and Colleges’ Substantive Change Committee.

Spotlight on Portal Development

EAD (Encoded Archival Description)

We are pleased to report progress on the development of the CRRA’s EAD indexer/viewer. EAD files will be indexed at a more granular level and displayed in a way that retains the hierarchical structure of EAD, while providing the user with necessary context. Congratulations and thanks to Eric Morgan for such fine work. Thanks, too, to Digital Access Committee members and participants in the CLIR meeting at Marquette for helping us to think through the issues.

You can read more about Eric’s progress on the CRRA Blog. Following is a sampling of Eric’s EAD-related posts:

Indexing MARC and EAD in VUFind with Solr
DAC Meeting Notes: Improving the Index/Display Harvesting EAD files
EAD Discussion at Marquette
Preparing EAD Files for Indexing

VuFind 2.0 Roadmap

The portal uses the open source application VuFind to index and search the metadata records describing members’ rare, unique, and uncommon materials.

In mid-September, Villanova University hosted a VuFind 2.0 Summit to facilitate developer and implementer community involvement in establishing the vision and setting the agenda for the continued growth and enhancement of this open source library discovery tool.

The Roadmap for Governance, Community Development, & Project Management outlines the series of technical goals and functional enhancements for version 2.0 of the software. In each case, there is a defined list of objectives. Action plans for project organization and software enhancement will be refined and finalized by the community during the remainder of 2010.

January 6, 2011: CRRA in San Diego – Make your plans to join us!We invite you to attend the CRRA reunion and discussions in San Diego on Thursday afternoon, January 6, 2011. This will be an opportunity to talk about CRRA activities taking place at your library, to discuss progress to date on the 2010/11 goals, and to explore our readiness to promote the Catholic portal to librarians and scholars. We want to hear from everyone – new and continuing members – how things are going at your library. Very importantly, this is an occasion to network and socialize with your CRRA colleagues. We look forward to seeing you there.

The draft agenda is as follows:

Thursday, January 6, 2011, Copley Library, University of San Diego

• Noon to 2 p.m. Board of Directors meeting
• Noon to 2 p.m. Campus and library tours to be arranged
• 2:30 – 5 p.m. Open forum for all participants with refreshments provided by the Copley Library
• 5:30 Dinner for all participants at Le Gran Terraza which offers a fine dining experience on campus (your own treat)

Theresa Byrd, University Librarian, has graciously volunteered to host our group on campus at the University of San Diego. The University campus is situated on a mesa overlooking San Diego Bay. The Spanish Renaissance architecture and breathtaking views of Mission Bay, the Pacific Ocean, the community of Linda Vista and Tecolote Canyon make the campus a not to be missed destination in San Diego.

See the full invitation on the CRRA blog:

CRRA in San Diego, January 2011

Call for Proposals

IMLS National Leadership Grants Due February 1, 2011

National Leadership Grants support projects that have the potential to elevate museum, archival, and library practice within the context of national strategic initiatives. The Institute seeks to advance the ability of museums, archives, and libraries to preserve culture, heritage, and knowledge, contribute to building technology infrastructures and information technology services, and provide 21st century knowledge and skills to current and future generations in support of a world-class workforce.

Full details are available at: http://www.imls.gov/applicants/grants/nationalleadership.shtm.

All CRRA events and events of possible interest to members are posted to the CRRA calendar, available at http://tiny.cc/Calendar798 and also accessible from the Admin area of the CRRA website.