This month’s update includes:
- A Focus on Members, from Janice Welburn, Chair, CRRA Board of Directors
To guide us in developing effective strategies for successful member engagement, the Board has set up a Membership committee and I’m delighted to welcome a current Board member, Evelyn Minick, University Librarian, Saint Joseph’s University, as the chair … The Committee’s major objectives are to grow the membership and ensure retention of current members …
- CRRA Collections Spotlight: The Philadelphia Archdiocesan Historical Research Center Catholic Newspaper Collection, by Shawn Weldon
The Philadelphia Archdiocesan Historical Research Center (PAHRC) holds one of the largest collections of Catholic newspapers in the United States …
- Update on the Digital Access Committee (DAC), from Demian Katz, DAC Chair
In spite of changes, DAC has pressed forward with several initiatives. The Catholic Portal, still the centerpiece of CRRA’s website, is under continuous improvement, both in response to member feedback gathered during usability testing and due to new features in the underlying VuFind software …
- Mark Your Calendars: All-Members Meeting, Anaheim, CA, June 25-26, 2012, all are invited;
Archival Networks and EAD Consortia at SAA in August (San Diego); Fall Symposium at DePaul University, Oct. 15-16, 2012
- Position Announcement:Duquesne University
Continue reading “April 2012 Update”
This text describes the beginnings of a set of statistical reports describing the use of the “Catholic Portal“.
More specifically, the Portal’s Web server log files are read on a daily basis, normalized, and saved to an underlying database. A number of queries are then applied to the database to create rudimentarily lists of tabulations. Each one of the reports are described below:
- Hosts – This report lists the Internet address or name of the top 100 computers using the Portal. To the best of our ability, the list excludes Internet robots and spiders, but the list needs to be updated. As of this writing, it is quite likely that many of the top computers are still robots, and the host named university.archives.nd.edu is probably the most frequent user of the Portal with shunat236-189.shu.edu coming in at a close second.
- Page count – This is a list of the number of hits the Portal received on any given day. Obviously the script creating this report needs to be updated in order to output data for the current year.
- Query strings – This is a tabulation of the most frequently used search terms applied against the Portal. The “null” query is probably a simple hit against the “browse” link at the bottom of the Portal’s home page and/or simply clicking the search box’s Find button. The queries in quotes are probably from clicks on hot linked search results.
- Referrers – This is a list of the websites where people came from before they visited the Portal. A whole lot of these websites are places where blog postings about the Portal appear. Many are spam. Some are HTML versions of the EAD finding aids. Further down the list one can begin to see Google searches.
- Referrers engines – This report is just exactly like the Referrers report except it only includes search engines (Google, Yahoo, and Bing).
- Tabs – This is a list of the most frequently used links used across the top of the Portal’s home page.
- Top records – This is a tabulation of the most frequently viewed records in the Portal. The first item on the list is an error, but as of this writing the most frequently viewed record is something from Catholic University of America.
- Types of searches – From this report is all but obvious that the overwhelming majority of the searches applied against the Portal are free text searches. Nobody uses the advanced search form.
- Whose records – This is a list of the names of the libraries/institutions whose records are viewed most frequently.
For a more technical description of how these reports are generated, see the blog posting entitled “Data warehousing Web server log files” as well as a follow-up posting called “Progress with statistics reporting“.
These reports can be improved in any number of ways. First, they could be represented graphically — pie charts, histograms, etc. Second, they could be re-generated on a month-by-month basis to look for trends over time. Luckily just about all the necessary data has been preserved. Alternatively, a peek at the Portal’s Google Analystics site may illuminate additional trends.
This posting describes my solution for transforming schema-based EAD files for the “Catholic Portal”. In a sentence, the solution boils down to removing the all the namespaces from the input.
Continue reading “Transforming schema-based EAD files”