This is the briefest of travelogues — a description of what went on at the Digital Humanities Forum, February 24, 2011.
On Thursday, February 24, the Hesburgh Libraries and the Catholic Research Resources Alliance (CRRA) sponsored the Digital Humanities Forum. The purpose of the event was to raise the awareness of the digital humanities across campus just a little bit. To that end we hosted two speakers and a couple of hands-on workshops.
The first speaker was Art Crivella (Crivella West). Crivella has recently become passionate about digital libraries and was instrumental in getting the digital access to Cardinal Newman content off the ground. “Books are very hard to let go of”, and “We are developing ways to read 90 million pages [of text]”, he said. He then described some of the work his company has been doing in this regard. Digitize the “best” text possible. Make it perfect. Divide the corpus into parts: published, oratory, tracts, and personal writings. Compile lists of words and phrases denoting broad subject areas, emotional connotations, and philosophic concepts. Implement a system — which looked a lot like a concordance — allowing scholars to search the corpus and identify relevant passages of text. Select paragraphs from the search results, click a button, and find similar paragraphs. He summarized by saying, “Algorithms of the 22nd Century are just as important as the works, sets, and commentary of collections. The sum of these things constitute the library of the near future.”
The second speaker was Ron Snyder, the Director of Advanced Technology at ITHAKA, and ITHAKA is the parent organization hosting JSTOR. The majority of the time, Snyder described the functionality of a JSTOR site called Data For Research (DFR). Built with open source software (most notably Solr), and accessible via a REST-ful application programmer interface (API), Data For Research provides a way to search the JSTOR content and apply data mining techniques against the result. The site outputs bibliographies, word frequencies, n-grams, keywords based on TFIDF, and references. It supports a bit of visualization, and data sets can be delivered to programmer’s desktops in the form of XML or CSV files. Snyder compared the DFR’s Web interface to a form of sculpture. “The DFR tool is like an ice sculpture where you whittle down your results.” This is true since it is entirely possible to access to the sum of the JSTOR content by first entering a couple of key words and use the resulting facets to reduce the set to a few items. The following day, Friday, Snyder facilitated two workshops. One akin to a traditional bibliographic instruction session, and the second a brief tutorial on how to use the API.
The first day’s session was attended by approximately 50 people. Just more than half were from the University, and the balance were members of the Catholic Research Resources Alliance. The workshops were attended by fewer but a similar mix of people. The feedback I received from the event was more or less positive. Words used to describe it included “interesting”, “intense”, and “thought provoking”. I believe the Digital Humanities Forum accomplished its goal.