How to make MARC and EAD metadata available in the “Catholic Portal”

This is a set of (draft) prescriptive instructions describing how to make MARC and EAD metadata available in the “Catholic Portal“.

Introduction

At its core, the “Portal” is an index — a list of pointers to content items. Access to this index is implemented through a form-based interface. Readers enter queries into the form, and items are returned. Readers are then expected to select items of interest from the returned list, and use them for the purposes of research and scholarship. In order to implement this functionality, each content item in the index requires, at the very least, three elements: 1) a unique identifier, 2) a human-readable description of the item, and 3) a location code where the item can be acquired.

The MARC and EAD metadata schemes are well-suited for indexing. After making sets of MARC records and/or EAD files transparently accessible on a Web server, it is easy to harvest the metadata, integrate it into the Portal’s index, and provide access to the content items.

The balance of this posting describes how to make MARC and EAD files available for harvesting.

MARC

Here’s the short version. Export all the MARC records from your integrated library system you think are apropos to the “Catholic Portal” making sure they are encoded using the UTF-8 character set. Save the resulting file on a Web server, and tell Eric Morgan the URL of the resulting file. Eric will do the rest.

Here’s the long version. Remember, every record in the Portal needs a unique identifier, a human-readable description, and a location code. For MARC records, this means every record first needs a value in the 001 field. Any value will do as long as it is unique to your set of records. Second, each MARC record needs something in the 245 field. At the very least this will be the human-readable description. All the other descriptive and analytic fields will supplement this description. Third, each MARC record needs to have a location code, and this is the item’s call number. This value will most likely be extracted from the 090 field.

Helping you decide which MARC records to extract from your integrated library system is beyond the scope of this document. But once you have figured that out it is recommended you denote which items are to be extracted by updating them with a local note. Here at the University of Notre Dame, we put the letters CRRA in field 590 subfield a. Once this is done it is relatively easy for the systems librarian to do a search for CRRA in field 590 subfield a, and dump the resulting records to a file. Alternatively, the systems librarian might search for all items whose call numbers begin with BX and dump the resulting set. The process you use to denote and export your MARC records depends on your local environment.

When exporting your MARC records from your integrated library system, it is imperative the records be encoded using the UTF-8 character set and not something else. The Portal’s underlying indexer does not deal very well with encodings of another kind. If your system does not export records as UTF-8, and it exports things in MARC-8 instead, then use an open source application called yaz-marcdump from Index Data to transform your records from one encoding into another. Once yaz-marcdump is installed you can execute a command like the following to do the transformation:

yaz-marcdump -f MARC-8 -t UTF-8 -o marc -l 9=97 input.mrc > output.mrc

The command translates MARC records from (-f) MARC-8 encoding to (-t) UTF-8 encoding. It outputs (-o) the result as MARC records, and inserts the letter a (ASCII character 97) into the leader (-l) at position 9. It uses the file named input.mrc as input, and it outputs the result to a file named output.mrc.

Every time you export your records, you should export everything that you feel is relevant to the portal. Do not worry about additions, changes, nor deletions. We here at Portal Central handle this issue by deleting all of your records locally and re-indexing the whole lot.

After the records have been exported, save them on a Web server, and finally, tell Eric Morgan the URL of the resulting file. Please don’t change the name of the URL. Eric will harvest the records and incorporate them into the index. As of this writing it is a good idea to tell Eric when new records are available, but at some point in time this won’t be necessary.

EAD

Here’s the short version. Use validated EAD files to encode the content you deem apropos to the Portal. Save all the EAD files in a single directory on a Web server making sure each file is given a .xml extension. Tell Eric Morgan the URL of the directory, and he will take care of the rest.

Here’s the longer version. Use whatever tool you desire to create EAD files describing the archival content you deem appropriate for the Portal. There are any number of available editors and applications facilitating this process. Make sure the resulting EAD files validate against the EAD DTD or schema. It doesn’t really matter which one, but right now validation against the DTD is easier to handle here at Portal Central.

Each did-level element in your EAD files will eventually become a record in the Portal’s index. During pre-processing here at Portal Central, unique unitid attributes will be added to each did-level element, if no unitid attributes exist in the first place. This pre-processing satisfies the need for unique identifiers. You need to do nothing in regards to unique identifiers.

Each did-level unittitle element will recursively be combined with its parent did/unittitle element to form a human-readable description of each content item. Consequently, there is nothing you need to do in regards to human-readable descriptions.

The location of items found in EAD files is facilitated in three ways. First, the name of your hosting institution and library/archive will be associated with each search result, thus the need for location information will be satisfied but only in a rudimentary way. Second, through the use of the url attribute of the eadid element, location information is re-enforced. Specifically, you are expected to include a value in the url attribute of the eadid element. This value is expected to point to a human-readable version of your EAD file on your Web server. Portal search results include hot links with a label similar to “View finding aid at owning institution”. The hot links will be the same as the value in the url attribute. Your human-readable version of the EAD file is then expected to include instructions and contact information describing how to acquire items of interest. Finally, search results will include a second hot link labeled similar to “View finding aid in Portal display”. These hot links will equal to a URL pointing to a local HTML file transformed from the original EAD. Again, location and contact information should be a part of the HTML because it was a part of the original EAD.

In summary, create complete and valid EAD files making sure you include values in the url attributes of the eadid elements.

Once you have created your EAD files, save them in a single directory on a Web server, and tell Eric Morgan the URL of the directory. Make sure each EAD file ends with a .xml extension. Eric will then regularly harvest all the .xml files from your directory, re-validate them, make sure they include url attributes, add unique identifiers to each did-level element, and index each did-level element.

Author: Eric Lease Morgan

I am a librarian first and a computer user second. My professional goal is to discover new ways to use computers to provide better library services. I use much of my time here at the University of Notre Dame developing and providing technical support for the Catholic Research Resources Alliance -- the "Catholic Portal".