This posting outlines how I believe I will add unitid elements to did elements of EAD files.
As the CRRA matures, I expect a greater amount of the metadata ingested into the “portal” will come from EAD files. In order to index EAD files meaningfully, I need to extract unique identifiers from each container-level element, a human-readable description of the container, and a location code. The identifier and human-readable description can easily come from unitid and unititle elements of did elements.
Unfortunately, unitid (and maybe unititle) are not required elements of did elements. While the CRRA could mandate the creation of such elements, it turns out to be almost just as easy to create them on-the-fly.
The good folks apart of the XML4Lib provided me with my solution — an XSLT stylesheet, below:
<xsl:stylesheet xmlns:xsl='http://www.w3.org/1999/XSL/Transform' version='1.0'> <!-- match everything and copy it --> <xsl:template match="node()|@*"> <xsl:copy><xsl:apply-templates select="@*|node()" /></xsl:copy> </xsl:template> <!-- special case; match dids with no unitid --> <xsl:template match="//did[not(unitid)]"> <xsl:copy> <!-- add a unit id --> <unitid><xsl:value-of select="generate-id()"/></unitid> <!-- continue copying --> <xsl:apply-templates select="@*|node()" /> </xsl:copy> </xsl:template> </xsl:stylesheet>
While not perfect, it certainly is a step in the right direction. Short and elegant. The next step will be to include some sort of parameter as input or to generate some EAD-specific identifier so each unitid value is unique across the corpus. (Actually, that is another issue I need to address.)
Thanks go to MJ Suhonos for the cool //did[not(unitid)] expression, Tod Olson for the idea of identity transformation (copying), and Stefan Krause for the use of generate-id.