Portal surgery

I was recently told to delete thousands upon thousands of records from the “Catholic Portal”, and through the magic of the Solr’s Web-based API and a full-featured HTTP client I was able to do this surgery with laser beam accuracy.

Specifically, I needed to delete all of the records in the Portal from the University of Notre Dame Archives because the Archives wanted to totally replace what finding aids were available. This meant deleting more than a 100,000 records from the underlying index. After a bit of investigation, I learned that at the following one-liner from the command line would do the trick:

curl http://localhost:8080/solr/biblio/update?commit=true -H "Content-Type: text/xml" --data-binary '<delete><query>id:unaead_*</query></delete>'

In short, curl is a command-line HTTP client. It is being told to first connect to the local host on port 8080. It is then told to find all the records matching the query “id:unaead_*” and delete them from the index named biblio. Once that is done, the underlying index is expected to commit the changes. Deleting these records took about ten minutes. I was then able to use my previously created scripts to harvest, validate, transform, and index the Archives’ content painlessly.

It is a pleasure when things work in the way they were designed! Now if I could only get my local indexing process to work faster.

Author: Eric Lease Morgan

I am a librarian first and a computer user second. My professional goal is to discover new ways to use computers to provide better library services. I use much of my time here at the University of Notre Dame developing and providing technical support for the Catholic Research Resources Alliance -- the "Catholic Portal".