The Catholic Research Resources Alliance (CRRA) has won a $49,764 Catholic Communications Campaign grant from the United States Conference of Catholic Bishops (USCCB) for Preservation and Online Access to Catholic History — NCWC/CNS 1920- . Funds will support the digitization and preservation of newsfeeds from the National Catholic Welfare Council (NCWC), currently known as the Catholic News Service (CNS) newsfeeds (the Catholic equivalent of Reuters).
100,000 pages of the NCWC/CNS newsfeeds from 1920- (approximately 30 years) will be digitized and made freely available through the CRRA-developed Catholic News Archive , a digital collection of Catholic diocesan and national newspapers. The newsfeeds will be digitized at the highest quality standards and the resulting digital images will be preserved in perpetuity.
Special thanks go to Katherine Nuss, Archivist for Catholic News Service. Katherine is an active member of the CRRA Digitizing Partners and has been a source of inspiration and guidance for CRRA’s digitization program. Katherine provided the sourcing material for the newsfeeds currently in the Archive and will continue to provide source material for the grant’s 100,000 images. Todd Jensen, CRRA Newspapers Digitization Project Manager, will coordinate efforts with vendors to ensure that the images and metadata are of the highest quality and are available in a timely fashion. The content will be accessible within the Catholic News Archive, along with the nearly 8,000 pages of newsfeeds currently available from the Vatican II years. Please see the http://thecatholicnewsarchive.org .
In addition to adding years of the Catholic News Service newsfeeds to the Archive, the grant signals a vote of confidence from the U.S. Bishops for our work.
About the Catholic Communications Campaign Grants: The CCC funds media projects – print, television, radio and Internet – that further the U.S. Conference of Catholic Bishops’ religious, charitable and educational purposes. Proposals must be unique and/or timely. There is a preference not to provide ongoing funding of projects. Proposals that have funding from other sources, indicating a broad support, are more likely to receive priority consideration.
This posting outlines how a “limit to full text” functionality was implemented in the “Catholic Portal’s” version of VuFind.
While there are many dimensions of the Catholic Portal, one of its primary components is a sort of union catalog of rare and infrequently held materials of a Catholic nature. This union catalog is comprised of metadata from MARC records, EAD files, and OAI-PMH data repositories. Some of the MARC records include URLs in 856$u fields. These URLs point to PDF files that have been processed with OCR. The Portal’s indexer has been configured to harvest the PDF documents, when it comes across them. Once harvested the OCR is extracted from the PDF file, and the resulting text is added to the underlying Solr index. The values of the URLs are saved to the Solr index as well. Almost by definition, all of the OAI-PMH content indexed by Portal is full text; almost all of the OAI-PMH content includes pointers to images or PDF documents.
Consequently, if a reader wanted to find only full text content, then it would be nice to: 1) do a search, and 2) limit to full text. And this is exactly what was implemented. The first step was to edit Solr’s definiton of the url field. Specifically, its “indexed” attribute was changed from false to true. Trivial. Solr was then restarted.
The second step was to re-index the MARC content. When this is complete, the reader is able to search the index for URL content — “url:*”. In other words, find all records whose URL equals anything.
The third step was to understand that all of the local VuFind OAI-PMH identifiers have the same shape. Specifically, they all include the string “oai”. Consequently, the very astute reader could find all OAI-PMH content with the following query: “id:*oai*”.
The third step was to turn on a VuFind checkbox option found in facets.ini. Specifically, the “[CheckboxFacets]” section was augmented to include the following line:
id:*oai* OR url:* = “Limit to full text”
When this was done a new facet appeared in the VuFind interface.
Finally, the whole thing comes to fruition when a person does an initial search. The results are displayed, and the facets include a limit option. Upon selection, VuFind searches again, but limits the query by “id:*oai* OR url:*” — only items that have URLs or come from OAI-PMH repositories. Pretty cool.
Kudos go to Demian Katz for outlining this process. Very nice. Thank you!
This blog posting outlines, describes, and demonstrates how a set of Catholic pamphlets were digitized, indexed, and made accessible through the Catholic Portal. In the end it advocates an evolution in librarianship.
A few years ago, a fledgling Catholic pamphlets digitization process was embarked upon.  In summary, a number of different library departments were brought together, a workflow was discussed, timelines were constructed, and in the end approximately one third of the collection was digitized. The MARC records pointing to the physical manifestations of the pamphlets were enhanced with URLs pointing to their digital surrogates and made accessible through the library catalog.  These records were also denoted as being destined for the Catholic Portal by adding a value of CRRA to a local note. Consequently, each of the Catholic Pamphlet records also made their way to the Portal. 
Because the pamphlets have been digitized, and because the digitized versions of the pamphlets can be transformed into plain text files using optical character recognition, it is possible to provide enhanced services against this collection, namely, text mining services. Text mining is a digital humanities application rooted in the counting and tabulation of words. By counting and tabulating the words (and phrases) in one or more texts, it is possible to “read” the texts and gain a quick & dirty understanding of their content. Probably the oldest form of text mining is the concordance, and each of the digitized pamphlets in the Portal is associated with a concordance interface.
For example, the reader can search the Portal for something like “is the pope always right”, and the result ought to return a pointer to a pamphlet named Is the Pope always right? of papal infallibility.  Upon closer examination, the reader can download a PDF version of the pamphlet as well as use a concordance against it. [5, 6] Through the use of the concordance the reader can see that the words church, bill, charlie, father, and catholic are the most frequently used, and by searching the concordance for the phrase “pope is”, the reader gets a single sentence fragment in the result, “…ctrine does not declare that the Pope is the subject of divine inspiration by wh…” And upon further investigation, the reader can see this phrase is used about 80% of the way through the pamphlet.
The process of digitizing library materials is very much like the workflows of medieval scriptoriums, and the process is well understood. Description and access to digital versions of original materials is well-accommodated by the exploitation of MARC records. The next step for the profession to move beyond find & get and towards use & understand. Many people can find many things, with relative ease. The next step for librarianship is to provide services against the things readers find so they can more easily learn & comprehend. Save the time of the reader. The integration of the University of Notre Dame’s Hesburgh Libraries’s Catholic Pamphlets Collection into the Catholic Portal is one possible example of how this evolutionary process can be implemented.
The Jesuit Libraries Provenance Project (JLPP) was launched in March 2014 to create a visual archive of provenance marks from historic Jesuit college, seminary, and university library collections and to foster a participatory community interested in the history of these books.
Founded by students, faculty, and library professionals at Loyola University Chicago, the Provenance Project is an outgrowth of an earlier project [http://blogs.lib.luc.edu/archives/] to reconstruct the holdings listed in Loyola’s original (c.1878) library catalog in an innovative virtual library system. That project, which was the subject of a graduate seminar at Loyola in Fall 2013 and will launch later this year, brought together graduate students in Digital Humanities, History, and Public History to recreate the nineteenth-century library catalog in a twenty-first century open source Integrated Library System (ILS). In the course of researching the approximately 5100 titles listed in the original catalog, students discovered that upwards of 1750 might still be held in the collections of Loyola’s Cudahy Library, the Library Storage Facility, and University Archives and Special Collections. A handful of undergraduate and graduate students formed the Provenance Project the following semester to see how many of these books actually survived. As they pulled books off the shelves and opened them up, they discovered a range of provenance marks – bookplates, inscriptions, stamps, shelf-marks, and other notations – littering the inside covers, flyleaves, and title pages of these books. Students soon realized that if the original library catalog could tell them what books the Jesuits collected, provenance marks could reveal from where the books came.
By utilizing the freely accessible online social media image-sharing platform Flickr, the Provenance Project seeks to create a participatory community of students, bibliographers, academics, private collectors, alumni, and others interested in the origin and history of Jesuit-collected books. A photostream within the Provenance Project Flickr site allows visitors to scroll through all of the pictures that have been uploaded while commenting and tagging functions provide the opportunity to share their own knowledge about specific images. For example, visitors can contribute transcriptions of inscriptions (especially ones written in messy or illegible hands), translations of words and passages in foreign languages, and identifications of former individual and institution owners. Not only does the Flickr site provide a visual index of the rich variety of works held by a late nineteenth-century Jesuit college library, but it also inspires reflection and scholarship on the importance of print to Catholic intellectual, literary, and spiritual life.
The Provenance Project also encourages undergraduate and graduate students to undertake mentored primary-source research on the history of individual books as well as broader themes in Catholic and book history. Their findings are shared with the public in a variety of ways. One of the rooms in the Summer 2014 exhibition, Crossings and Dwellings: Restored Jesuits, Women Religious, American Experience 1814-2014 at the Loyola University Museum of Art (LUMA) featured original library books selected by graduate students and accompanied by interpretative labels they wrote. Student interns regularly contribute original scholarship to the Provenance Project’s website as well as to the June 2015 issue of the Catholic Library World on the “Digital Future of Jesuit Studies.” [Citation: “The Digital Future of Jesuit Studies,” Catholic Library World 85:4 (June 2015): 240-259.] They have also given talks on their research at conferences, such as the annual meeting of the American Catholic Historical Association. The 2014 commemoration of the bicentennial of the restoration of the Society of Jesus has brought renewed scholarly to nineteenth-century Jesuits. The work of Provenance Project interns is actively contributing to that resurgence of interest.
As of February 2016, students have tracked down all of the surviving books from the list of 1750 titles and are in the process of discerning how many of these titles are actual matches for those in the original catalog. (The answer appears to be the vast majority, making for a much higher survival rate than initially expected.) The team recently posted its 5000th image to the Flickr archive and still has many more images to upload over the coming months. Images on Flickr have also been usefully organized into albums either by nature of provenance mark (stamp, bookplate), part of book (illustrations, endpapers, binding), or division of the catalog (Pantology, Theology, Legislation, Philosophy, History, Literature). For those who would like to contribute to the Project, there are still many passages in need of translation and ownership marks in need of identification (helpfully gathered into the albums “Unidentified Inscriptions”, “Unidentified Stamps”, “Unidentified Embossed Stamps”, and “Unidentified Bookplates”).
Please follow the JLPP on Flickr (@JLPProject), Facebook and on Twitter (@JesuitProject). We try to post new books everyday and scholarship on the blog every week or so during the semester, so check back often!
A final note: the Provenance Project is beginning conversations about expanding the site to include provenance images from the collections of other historic Jesuit college, seminary, and university libraries. If you are interested in learning more about participating, or want information about how to start a project for your own institution, don’t hesitate to contact Kyle Roberts.
The primary purpose of this posting is to document some of my experiences with OAI and VuFind. Specifically it outlines a sort of “recipe” I use to import OAI content into the “Catholic Portal“. The recipe includes a set of “ingredients”, site-specific commands. Towards the end, I ruminate on the use of OAI and Dublin Core for the sharing of metadata.
When I learn of a new OAI repository containing metadata destined for the Portal, I use the following recipe to complete the harvesting/indexing process:
Use the OAI protocol directly to browse the remote data repository – This requires a slightly in-depth understanding how OAI-PMH functions, and describing it any additional detail is beyond the scope of this posting. Please consider perusing the OAI specification itself.
Create a list of sets to harvest – This is like making a roux and is used to configure the oai.ini file, next.
Edit/configure harvesting via oai.ini and properties files – The VuFind oai.ini file denotes the repositories to harvest from as well as some pretty cool configuration directives governing the harvesting process. Whomever wrote the harvester for VuFind did a very good job. Kudos!
Harvest a set – The command for this step is in the list of ingredients, below. Again, this is very-well written.
Edit/configure indexing via an XSL file – This is the most difficult part of the process. It requires me to write XSL, which is not too difficult in and of itself, but since each set of OAI content is often different from every other set, the XSL is set specific. Moreover, the metadata of the set is often incomplete, inconsistent, or ambiguous making the indexing process a challenge. In another post, it would behoove me to include a list of XSL routines I seem to use from repository to repository, but again, each repository is different.
Test XSL output for completeness – The command for this step is below.
Go to Step #5 until done – In this case “done” is usually defined as “good enough”.
Index set – Our raison d’être, and the command is given below.
Go to Step #4 for all sets – Each repository may include many sets, which is a cool OAI feature.
Harvest and index all sets – Enhance the Portal.
Go to Step #10 on a regular basis – OAI content is expected to evolve over time.
Go to Step #1 on a less regular basis – Not only does content change, but the way it is described evolves as well. Harvesting and indexing is a never-ending process.
I use the following Linux “ingredients” to help me through the process of harvesting and indexing. I initialize things with a couple of environment variables. I use full path names whenever possible because I don’t know where I will be in the file system, and the VUFIND_HOME environment variable sometimes gets in the way. Ironic.
# configure; first the name of the repository and then a sample metadata file
rm -rf /usr/local/vufind2/local/harvest/$NAME/*.delete
rm -rf /usr/local/vufind2/local/harvest/$NAME/*
# delete; an unfinished homemade Perl script to remove content from Solr
# harvest; do the first part of the work
cd /usr/local/vufind2/harvest/; php harvest_oai.php $NAME
# test XSL output
cd /usr/local/vufind2/import; \
php ./import-xsl.php --test-only \
# index; do the second part of the work
/usr/local/vufind2/harvest/batch-import-xsl.sh $NAME $NAME.properties
Using the recipe and these ingredients, I am usually able to harvest and index content from a new repository a few hours. Of course, it all depends on the number of sets in the repository, the number of items in each set, as well as the integrity metadata itself.
As I have alluded to in a previous blog posting, the harvesting and indexing of OAI content is not straight-forward. In my particular case, the software is not to blame. No, the software is very well-written. I don’t take advantage of all of the software’s features though, but that is only because I do not desire to introduce any “-isms” into my local implementation. Specifically, I do not desire to mix PHP code with my XSL routines. Doing so seems too much like Fusion cuisine.
The challenge in this process is both the way Dublin Core is used, as well as the data itself. For example, is a PDF document a type of text? Sometimes it is denoted that way. There are dates in the metadata, but the dates are not qualified. Date published? Date created? Date updated? Moreover, the dates are syntactically different: 1995, 1995-01-12, January 1995. My software is stupid and/or I don’t have the time to normalize everything for each and every set. Then there are subjects. Sometimes they are Library of Congress headings. Sometimes they are just keywords. Sometimes there are multiple subjects in the metadata and they are enumerated in one field delimited by various characters. Sometimes these multiple subject “headings” are manifested as multiple dc.subject elements. Authors (creators) present a problem. First name last? Last name first? Complete with birth and death dates? Identifiers? Ack! Sometimes they include unique codes — things akin to URIs. Cool! Sometimes identifiers are URLs, but most of the time, these URLs point to splash pages of content management systems. Rarely do the identifiers point the item actually described by the metadata. And then there out & out errors. For example, description elements containing URLs pointing to image files.
Actually, none of this is new. Diane Hillmann & friends encountered all of these problems on a much grander scale through the National Science Foundation’s desire to create a “digital library”. Diane’s entire blog — Metadata Matters — is a cookbook for resolving these issues, but in my way of boiling everything done to their essentials, the solution is two-fold: 1) mutual agreements on how to manifest metadata, and 2) the writing of more intelligent software on my part.
Jim McCartin, Associate Professor of Theology and Director of the Center on Religion and Culture at Fordham University, joins us for a discussion of his research and the role CRRA has played in shaping and abetting his scholarly work. His book, Prayers of the Faithful: The Shifting Spiritual Life of American Catholics, came out in 2010 and explores prayer in the lives of American Catholics from the 1860s to the 1980s. His current project is the book: American Catholics and Sex from the 1830s to the 1980s.
What is your current area of research?
I’m currently working on a book project on the history US Catholics and sex from the 1830s to the 1980s. The study begins with early nineteenth-century European Catholic immigrants and the anxieties they provoked among non-Catholics concerned that Catholics were sexual deviants because of their practice of vowed celibacy, and it ends with the emerging story of clerical sex abuse in the late twentieth century. In between, it turns out that the story of US Catholics and sex is a great deal more interesting and complicated than historians and others have normally assumed, which makes this project especially exciting.
How did you get interested in your research area?
Well, after the clerical sex abuse scandal exploded in 2002, it occurred to me that, while there is a lot of published work out there on the history of US sexuality, that work has not dealt at all adequately with how religion fits into the story of sex, and in particular, it hasn’t given very serious attention to Catholicism’s place in that story. I was looking for ways to think about how we get to the clerical sex abuse scandal of the early 2000s, and I found nothing that could provide an adequate, sensible narrative grounded in deep archival research. So, while my goal isn’t specifically to write a history of Catholicism and sex abuse, this project emerged out of a desire to offer a narrative that is sufficiently textured and grounded and one that can help to place sex abuse into a larger narrative frame.
How do you use the CRRA’s resources for your research? Which resources have been the most helpful, and why? How has Catholic Newspapers Online been useful?
CRRA has been extremely useful in helping me identify a whole array of published and archival sources for this project. There’s no better way to be able to survey the published materials on Catholicism available in the United States, and I’ve made probably a dozen archival trips based on materials I’ve identified through the Portal. I have to say that I’ve been especially grateful for the digitized newspapers, though, which have been a tremendous source as I try to get a sense for how family life and related questions of sexuality played out on the ground in various local settings.
What’s the most exciting/surprising source you’ve been able to get access to for your research?
There’s are a lot out there that has fascinated me. Among the most interesting sources I’ve come across are the trial records for an 1843 clerical rape trial that figures into the narrative I’m constructing. But there’s also just a wealth of interesting documentation on the practice of clerical celibacy in the 1880s and 1890s, on sex education in the 1920s, on Catholic arguments over the Rhythm Method in the 1930s, and on same-sex attraction in the 1940s and 1950s. It turns out that US Catholics had a quite complicated and pretty well-informed conversations around these and other themes, conversations that are much more nuanced and interesting than they are normally given credit for.
What do you wish you could get access to but is currently unavailable?
Good question. I’m not exactly sure there are resources out there on this, but I’d love to have access to documents that provide a clearer sense of how sexuality was framed in the formation of male and female religious in the first half of the twentieth century. I’d also love to see archival materials related to the work of the Servants of the Paraclete, a religious order that, already in the early psot-1945 era, began to care for priests involved in sexual relationships of one kind or another.
CRRA’s Annual All Members Meeting will take place Tuesday, July 2 during ALA Chicago. There are several opportunities to get together with you, our CRRA friends and colleagues. Please review the events below (in order by date) and RSVP to Felice Maciejewski at email@example.com.
Click here for a PDF of CRRA Events in Chicago, or just read on…
We hope to see you in Chicago!
Digitizing Catholic Newspapers
Monday July 1: 2:00 – 5:00 p.m. DePaul Loop Library Instruction Room
1 East Jackson Blvd. (in downtown Chicago)
Come and share your interests and/or experience in digitizing newspapers. We will learn more about newspaper digitization and other digital services offered by Lyrasis, exploring a possible collaborative initiative for funding strategies and support, digitization of collections, and access. Your input is central to identifying what will help you and other CRRA members do what you want to do, and how your digitizing projects might be part of and benefit from a more comprehensive effort.
A taste of Italy: Dinner at Quartino’s Monday July 1: 6:30 p.m.
Enjoy the distinctive Italian small-plates menus and vintage ambience at Quartino Ristorante & Wine Bar in Chicago’s Near North side. Meet us at 6:30 and treat yourself to an enjoyable dinner with friends.
CRRA All Member Meeting Tuesday July 2 : 8:30 a.m. to Noon
Breakfast at 8:30 and Meeting at 9:00 University Club Northwestern Room (A&B)
Dress is business casual 76 East Monroe (corner of Michigan Ave. and Monroe St.)
Start your day with breakfast (hosted by Dominican University) and stay to contribute your ideas to CRRA. Our agenda is a window into mission-support for the next year. We want your input on the proposed top priorities: expanding access to Catholic newspapers, engaging in more outreach about the portal and member mentoring, and harvesting new content. We are setting up small group and plenary discussions on ways in which we can get the word out about the portal and CRRA collections and especially what works at your institution. We will explore ways in which individuals and institutions can increase our capacity to carry out our mission of providing global enduring access to Catholic research resources in the Americas.
Treasures of Faith: Twenty Years of Acquisitions Newberry Library
60 West Walton Street
In 1991, Newberry Trustee Sister Ann Ida Gannon, president Emerita of Mundelein College, arranged for the transfer of Mundelein’s rare book collection to the Newberry. More donations followed of which a selection are in this exhibit of books used by American seminarians. Hours posted on the website.
Stay the night at Catholic Theological Union (CTU) Guest Housing 5401 S. Cornell Avenue (an easy bus ride to downtown)
If you are joining us for CRRA events and need a place to stay, consider the reasonably priced guest rooms at Catholic Theological Union in the heart of the Hyde Park neighborhood. Contact CTU directly or write to Melody McMahon , Library Director, CTU for more information on staying on campus.
We hope to see you in Chicago!
Please share this invitation with others at your institution. Our meetings are open to others interested in our mission and activities and who may not yet be members. All are welcome!