How to make CRRA metadata available via the FTP “dropbox”

When CRRA members are not able or do not want to make their metadata available their own website, we hear at Catholic Portal Central will create one for them. To begin the process, a CRRA member simply needs to express this desire with me, Eric Lease Morgan (574/631-8604; emorgan@nd.edu), and we will go from there.   Continue reading “How to make CRRA metadata available via the FTP “dropbox””

FTP site as dropbox

In an effort to make it easier for us here at the “Catholic Portal Home Planet”, we have implemented an FTP site designed to be used as a dropbox.

For the longest time Catholic Research Resources Alliance (CRRA) members sent me their metadata via email. I was then expected to parse it, index it, and make it available for searching. A hidden task in this scenario was archiving the metadata — a task that is not really very scalable. Consequently, I advocated CRRA members make their metadata available via a website where I could then harvest the data with us. Unfortunately and to my surprise, not every CRRA member was able to do this mostly because of local infrastructure policies.

To overcome the limitations of some CRRA members, I created an FTP site allowing them to deposit their metadata. This same FTP site is also accessible via the Web, and therefore I can have my cake and eat it too. No CRRA members need to send me their metadata, and I can harvest it from a Web server.

If you are a CRRA member who is unable or not allowed to make your metadata available via the Web, then get in touch with me, Eric Lease Morgan (574/631-8604; emorgan@nd.edu), and I will give you instructions for making your metadata available via the “dropbox”.

Statistical reports against the “Catholic Portal”

This text describes the beginnings of a set of statistical reports describing the use of the “Catholic Portal“.

More specifically, the Portal’s Web server log files are read on a daily basis, normalized, and saved to an underlying database. A number of queries are then applied to the database to create rudimentarily lists of tabulations. Each one of the reports are described below:

  • Hosts – This report lists the Internet address or name of the top 100 computers using the Portal. To the best of our ability, the list excludes Internet robots and spiders, but the list needs to be updated. As of this writing, it is quite likely that many of the top computers are still robots, and the host named university.archives.nd.edu is probably the most frequent user of the Portal with shunat236-189.shu.edu coming in at a close second.
  • Page count – This is a list of the number of hits the Portal received on any given day. Obviously the script creating this report needs to be updated in order to output data for the current year.
  • Query strings – This is a tabulation of the most frequently used search terms applied against the Portal. The “null” query is probably a simple hit against the “browse” link at the bottom of the Portal’s home page and/or simply clicking the search box’s Find button. The queries in quotes are probably from clicks on hot linked search results.
  • Referrers – This is a list of the websites where people came from before they visited the Portal. A whole lot of these websites are places where blog postings about the Portal appear. Many are spam. Some are HTML versions of the EAD finding aids. Further down the list one can begin to see Google searches.
  • Referrers engines – This report is just exactly like the Referrers report except it only includes search engines (Google, Yahoo, and Bing).
  • Tabs – This is a list of the most frequently used links used across the top of the Portal’s home page.
  • Top records – This is a tabulation of the most frequently viewed records in the Portal. The first item on the list is an error, but as of this writing the most frequently viewed record is something from Catholic University of America.
  • Types of searches – From this report is all but obvious that the overwhelming majority of the searches applied against the Portal are free text searches. Nobody uses the advanced search form.
  • Whose records – This is a list of the names of the libraries/institutions whose records are viewed most frequently.

For a more technical description of how these reports are generated, see the blog posting entitled “Data warehousing Web server log files” as well as a follow-up posting called “Progress with statistics reporting“.

These reports can be improved in any number of ways. First, they could be represented graphically — pie charts, histograms, etc. Second, they could be re-generated on a month-by-month basis to look for trends over time. Luckily just about all the necessary data has been preserved. Alternatively, a peek at the Portal’s Google Analystics site may illuminate additional trends.

Prioritized list of fixes/enhancements for the “Portal”

Based on our usability studies and conference call from the other day I have created a (more or less) prioritized list of fixes/enhancements to be applied to the “Portal”:

  • add a a note to the email dialog box denoting how the from field is mandatory and requires an email address
  • create a directory of institutions, and from search results hyperlink institutions’ names to the directory
  • update the “Portal” look & feel (theme) so it is based on the “blueprint” theme
  • turn off the “Suggested Topics” feature
  • fix the author searches so when author names are clicked the content displays correctly
  • make the login links float to the right instead of the left
  • change the red text — such as the text in the search box — to black
  • change the login label to read “Login / Create account”

On my mark. Get set. Go.