This text describes the beginnings of a set of statistical reports describing the use of the “Catholic Portal“.
More specifically, the Portal’s Web server log files are read on a daily basis, normalized, and saved to an underlying database. A number of queries are then applied to the database to create rudimentarily lists of tabulations. Each one of the reports are described below:
- Hosts – This report lists the Internet address or name of the top 100 computers using the Portal. To the best of our ability, the list excludes Internet robots and spiders, but the list needs to be updated. As of this writing, it is quite likely that many of the top computers are still robots, and the host named university.archives.nd.edu is probably the most frequent user of the Portal with shunat236-189.shu.edu coming in at a close second.
- Page count – This is a list of the number of hits the Portal received on any given day. Obviously the script creating this report needs to be updated in order to output data for the current year.
- Query strings – This is a tabulation of the most frequently used search terms applied against the Portal. The “null” query is probably a simple hit against the “browse” link at the bottom of the Portal’s home page and/or simply clicking the search box’s Find button. The queries in quotes are probably from clicks on hot linked search results.
- Referrers – This is a list of the websites where people came from before they visited the Portal. A whole lot of these websites are places where blog postings about the Portal appear. Many are spam. Some are HTML versions of the EAD finding aids. Further down the list one can begin to see Google searches.
- Referrers engines – This report is just exactly like the Referrers report except it only includes search engines (Google, Yahoo, and Bing).
- Tabs – This is a list of the most frequently used links used across the top of the Portal’s home page.
- Top records – This is a tabulation of the most frequently viewed records in the Portal. The first item on the list is an error, but as of this writing the most frequently viewed record is something from Catholic University of America.
- Types of searches – From this report is all but obvious that the overwhelming majority of the searches applied against the Portal are free text searches. Nobody uses the advanced search form.
- Whose records – This is a list of the names of the libraries/institutions whose records are viewed most frequently.
For a more technical description of how these reports are generated, see the blog posting entitled “Data warehousing Web server log files” as well as a follow-up posting called “Progress with statistics reporting“.
These reports can be improved in any number of ways. First, they could be represented graphically — pie charts, histograms, etc. Second, they could be re-generated on a month-by-month basis to look for trends over time. Luckily just about all the necessary data has been preserved. Alternatively, a peek at the Portal’s Google Analystics site may illuminate additional trends.