This is an outline – a recipe – for getting your metadata records into the “Catholic Portal.”
- Identify specialists – It takes many people with many skills to extract content for the Portal. It requires bibliographers (subject specialists) who know which materials located in their local library fit the scope of the project. It requires catalogers (metadata specialists) who know how the local materials are described. It requires systems librarians (database administrators) who can extract metadata records from underlying system(s).
- Have a meeting – Bring all the specialists together to discuss Steps #3-7.
- Understand the scope of the Portal – This is akin to understanding the purpose of the Portal, who is its intended audience, and what is its collection policy. In short, the Portal is intended to contain rare, unique, and/or infrequently held materials useful for scholarly Catholic research.
- Identify resources/collections – List the resources/collections in the library which fall into the scope of the Portal. Examples might include rare books & manuscripts, digitized images, sound recordings, the papers of famous individuals, the archives of leading organizations, pamphlets, newspapers, etc.
- Articulate how identified resources/collections are described– For each of the resources/collections identified in Step #4 determine which ones have metadata and which ones don’t.
For those items which do have metadata, list how items in the collection are denoted in your various computer systems. Are they all in a particular call number range? Are they the totality of items in your “special collections” department and/or encoded as EAD files? Have they all been cataloged with a local note in your integrated library system (ILS)? Are they all or a subset of items saved in a local spreadsheet or database? Etc.
For “extra credit,” discuss ways the items which don’t have metadata can get some in the future.
- Extract metadata records – Given the things discussed in Step #5, collect the metadata records from your system(s). For example, some sort of search might be done to extract all identified MARC records from an ILS. All EAD files describing materials apropos to the Portal might be saved to a directory. A report might be written against a database to create a tab-delimited text file. Etc.
There are three things to remember when extracting the metadata. The first is something we are calling “MARC-ability“. For better or for worse, VuFind only accepts MARC records as input, and consequently, all metadata received for ingestion must be translated into MARC. Thus, “real” MARC records are easily accepted, but “tagged” MARC records are not. EAD files can be cross-walked to MARC and thus easily accepted. Some sort of delimited (CSV, tab, etc.) file works well because they are easily parsed. HTML files are poorly structured making any mapping process very difficult. The same goes for any word-processed file (Word, WordPefect, etc.). MARC, any flavor of XML, and delimited files work best.
Second, each metadata record requires a number of specific fields. Each record requires a unique identifier. For MARC records this is a value in the 001 field. For EAD files, this is denoted by the identifier attribute in the eadid element. Next, each record requires a pointer to where the described item can be found. Generally speaking, these are either call numbers or URLs saved in the appropriate fields. The last requirement is not really a field but formatting. All data must be saved using the UTF-8 character encoding. Any other encodings (like MARC-8) are not readable. If not saved in plain ASCII or UTF-8, then diacritics display incorrectly and confuse the VUFind indexer.
The third and final thing to remember is in regards to four levels of data integrity. The first level speaks to the way your data is structured. For XML files this means they are well-formed. For MARC records, it means the leader is 24 bytes long, the value of the first 5 characters of the leader equals the length of each record, fields are delimited with the appropriate ASCII characters, etc. The second level speaks to validity. For XML files it means the data conforms to a DTD or schema. For MARC records it means authors are in 1xx fields, the title is in 245, notes are in 5xx, etc. The third level of integrity is correctness. “To what degree is the value in 245 the title of the item? To what degree are the URLs not broken? Etc.” The last level of integrity is in regards to completeness. A metadata record’s completeness is directly proportional to its findabilty. The first two levels of integrity can be validated through computer technology. The second two levels are the domain of librarianship.
- Send records to Notre Dame – After the records have been exported, email them to email@example.com, and they will be ingested into VuFind.
You’re done! We will notify you via email when your records are available for viewing, giving you the opportunity to validate the process and examine the fruits of your labors.
Finally, this “recipe,” like any good recipe, is only an outline of what needs to be done. There will surely be variations along the way, but based on our experience, this outline represents a good way to get started.
If you have questions along the way, don’t hesitate to contact Eric or Pat:
Eric Lease Morgan firstname.lastname@example.org 574.631.8604
Pat Lawton email@example.com 574.631.1324