CRRA in San Diego January 6, 2011

From left to right: Eric Morgan (ND), Eric Frierson (St. Ed’s), Marta Deyrup (Seton Hall), Clay Stalls (Loyola Marymount), Kris Brancolini (Loyola Marymount), Jennifer Younger (CRRA), Tyrone Cannon (Univ of San Francisco), Janice Welburn (Marquette), Jean Zanoni (Marquette), Pat Lawton (CRRA), Alma Ortega (Univ of San Diego), Theresa Byrd (Univ of San Diego), Susan Ohmer (Notre Dame), Laverna Saunders (Duquesne), Diane Maher (U San Diego), Ed Starkey (U San Diego)

The San Diego meeting provided an opportunity for new and continuing CRRA members and friends to look at the enhanced portal, discuss future directions for the CRRA,  and last but not least,  to get to know one another.

CRRA in San Diego

This is a simple annotated list of links used as an outline for a presentation to the CRRA in San Diego:

  1. CRRA website – The good ol’ look & feel but wrapped around new content and functionality. (“Thank you, Eric Frierson!”)
  2. Web 2.0 – All the Web 2.0 links (cite this, email this, favorite this) that did not work previously now function correctly.
  3. EAD viewer – It is now possible to view EAD files locally or from the originating institution.
  4. Item-level indexing – The content of EAD files is indexed at the item level making for finer-grained searching.
  5. PDF display – Records linking to digitized versions of books now enable a person to get the full text. Examples include content from the St. Michael’s and the University of Notre Dame
  6. Text mining – After extracting the full text from the PDF documents, it is possible to apply concordancing techniques to the full text for analysis.
  7. Automated updating – The “Portal” can be updated automatically by harvesting metadata from member institutions, massaging it for the Portal, and re-indexing it on a regular basis.
  8. Use statistics – Rudimentary Web server log file analysis as well as Google Analytics reports illustrate how the Portal is being used.
  9. Blog – A running commentary on what’s happening with Portal development.

Catholic Portal look & feel

Thanks to the good work done by Eric Frierson of St. Edwards University, the “sandbox” of “Catholic Portal” now sports the look & feel of our public view:

screen shot

Moreover, since the “sandbox” is runs version 1.0 of VUFind, many of the Web 2.0 links work correctly. In other words, things like emailing, tagging, citing, reviewing, etc. function correctly.

While we could move this whole thing into production, it may behoove use to associate hyperlinks with each found item to point back to hosting institutions to facilitate access. What do you think?

Digital Access Committee (DAC) Meeting

Today we had a CRRA Digital Access Committee (DAC) meeting via the telephone. Attendees included:

  • Ann Hanlon
  • Demian Katz
  • Eric Frierson
  • Eric Morgan
  • Kevin Cawley
  • Pat Lawton
  • Susan Leister
  • Thomas Leonhardt

I did a bit of “Portal” show & tell demonstrating the work done to date on indexing EAD files. (See the previous blog posting.) We then discussed ways the indexing/display could be improved. Suggestions included:

  • putting the words “Archival material” into the format field of the Solr index thus allowing better faceting
  • reading the value of langmaterials and using it as the value for Solr’s language fields, again allowing for better faceting
  • reading all of the fields associated with a given container-level element and putting them into Solr’s allfields field to improve indexing
  • extracting the last value of our current “title”, using it as our title, and using the remaining values as some sort of supplemental description or alternatively, simply reversing the “title” string

We then brainstormed ways to resolve character encoding issues, the feasibility of making our metadata available via Web servers, and the status of the metadata guidelines.

We felt we had discussed it all, so the meeting was over.

Very satisfying!

I have made significant progress in the process of harvesting EAD files and preparing them for ingestion into the “Catholic Portal”. This posting outlines the successes.

Assuming a Catholic Research Resources Alliance members place their EAD files in a HTTP-accessible directory, and those files have a .xml extension, then the following Perl scripts enable me to harvest and prepare them for indexing:

  • harvest-ead.pl – reads remote HTTP-accessible directories and copies all of the .xml files found there to a local cache
  • validate.pl – makes sure the cached XML files are well-formed and conform to the EAD DTD, and if not, then move the files to a different directory
  • transform.pl – reads the validated XML files, adds id attributes to all unitid elements through the use of a stylesheet (addunitid.xsl), transforms the resulting XML into HTML using another stylesheet (ead2html.xsl), and saves the result to an HTTP-accessible directory

What was really cool and a huge time-saver was the use of ead2html.xsl. Originally named AAAv2002-HTML.xsl, found on a page called User Contributed Stylesheets, and submitted by Stephanie Ashley, this stylesheet took my id attributes and automatically made named anchors for me. Boy, did I get lucky. “Thank you, Stephanie!”

My next step is to revisit my indexing routines.

Collection Policy Statement for the Catholic Portal

(The following is the current collection policy for the Catholic Portal.)

Collection Policy Statement for the Catholic Portal

The purpose of the Catholic Research Portal is to provide global access to the wealth of research resources relating to the Catholic experience. Of primary interest are rare, unique and uncommon Catholic research materials. Because these resources are often uncataloged and little known outside their institutional repositories, the Portal seeks to encourage broad participation and to provide support to libraries, archives, and other institutions that wish to participate in this project but lack the resources to do so. The Portal will ultimately facilitate and assist researchers and students in identifying Catholic research resources and make Catholic scholarship more productive. In doing so, the Catholic Research Portal will contribute substantially to the generation of new knowledge.

Continue reading “Collection Policy Statement for the Catholic Portal”

Web 2.0 features

After tweaking with VUFind’s configuration files, our “sandbox” implementation of the “Catholic Portal” now supports many (if not all) of VUFind’s Web 2.0 features — faceted browse, favorites, cover art, reviews, author blurbs, etc. Please give them a whirl. Create an account for yourself and add some items to your Favorites.

NTS (“note to self”), the account creation process did not work until I changed the value of RewriteBase in my httpd.conf file from /vufind to /.