This posting outlines the possibilities for ingesting PastPerfect content into the “Catholic Portal”.
As membership in the Catholic Research Resources Alliance (CRRA) grows, so does the number of metadata formats the “Catholic Portal” is expected to support. When the CRRA was just beginning MARC was the predominate metadata format. After the content of university archives was recognized as significant, EAD became very important. Some institutions use neither MARC nor EAD to describe their special collections but instead use systems like ContentDM. These sorts of things are often accessible via OAI-PMH, and thus, at the very least, harvestable Dublin Core is available. In order to support discovery, all of these types of metadata need to be parsed, mapped to VuFind’s underlying Solr schema, and indexed.
It has come to my attention that some of the CRRA’s membership may be using an application called PastPerfect by PastPerfect Software, Inc. to describe their collections. After a bit of investigation, I learned that PastPerfect supports a number of exportable metadata formats. One of those formats is an XML file complete with Dublin Core elements. Here is a sample record:
<?xml version="1.0" encoding="windows-1252" standalone="yes"?> <metadata> <dc-record> <type>text</type> <type>original</type> <type>cultural</type> <format>23 cm. 292. p. Includes index.</format> <title>Guide to the Use of Books and Libraries</title> <title>Book</title> <description>Guide to the Use of Books and Libraries....</description> <subject>Book</subject> <subject>1. Reference books.</subject> <subject>2. Libraries--Handbooks, manuals, etc.</subject> <subject>3. Library.</subject> <subject>Gates, Jean Key</subject> <creator>Gates, Jean Key</creator> <contributor>Wright, Richard R. and Susan Gamer, editors.</contributor> <publisher>McGraw-Hill Book Company</publisher> <date>1979</date> <identifier>2000.4.3</identifier> <language>English</language> <coverage>1979 - 1979</coverage> <coverage>New York, NY</coverage> </dc-record> </metadata>
The XML is straight-forward enough and seems to be well-formed, but there does not seem to be any DTD nor schema for validation. The content in each of the elements comes straight from the data entry of the PastPerfect system so things like “1. ” or “2. ” in the subject elements are included apparently because that is what someone typed in. Similarly, subheadings delimited by “–” and the multiple values in the format element are reminiscent of MARC records and their ISBD (International Standard Bibliographic Description) codes. The repeating elements like coverage or type make things challenging, but not insurmountable. In short, the issues surrounding the mark-up are relatively minor. It is not ideal, but it is functional.
The bigger issue surrounds linking to the original item. While each metadata record includes a unique identifier, there is seemingly no way to enable the reader to either see a full-record display at the hosting institution or see the item being described; the PastPerfect records are not associated with an actionable URI (Universal Resource Identifier). This means each record, in order to be used and before it is indexed, will need to be associated with the postal address of the library/archive using PastPerfect, and readers will need to get in touch with the librarian/archivist if they want to use the item.
We don’t live in an ideal metadata world, and increasingly it seems the best we can hope for is well-formed and valid metadata. Whether our metadata is complete or accurate is completely in the hands of people, not computers.