Tools & helper files Syracuse University Special Collections Research Center


URL: http://library.syr.edu/information/spcollections/findingaids/index.html

Description:

In 2004 Syracuse University had approximately 12,000 pages of finding aids covering 800+ collections, most existing only in hard copy. Immediate mass conversion of all finding aids was not an option, so we began by selectively identifying finding aids for large/in-demand collections or sets of related collections and converting them; over time we moved on to lesser-known collections, and to finding aids that were suspect, complicated or insufficient (e.g., those requiring incorporaton of annotations, merging of two or more separate accessions, or other manual intervention as part of the conversion process).

We began with a test set of 20 finding aids and hammered out many of the bugs both with encoding and XSLT conversion using that data set, though we continue to occasionally encounter oddities. This gave us two very useful results: the first was detailed, documented tagging specifications, most of which are embodied in the second, a skeleton EAD template that is pre-loaded with much of the standard coding such as institution name, encodinganalog attributes, etc.

The last step was to create collection-level EAD records for collections that had no finding aid whatsoever. We decided that something was better than nothing, and that we would go ahead and place on-line ALL finding aids, whether the collection was processed or not. Finding aids for unprocessed collections included an access restriction note, "This collection is unprocessed and accessible by special permission only. Please contact the repository listed above for more information."

As of December 2010, with the assistance of numerous student workers and interns eager to get hands-on EAD experience, we have collection-level or better finding aids for every one of our 2000+ collections, large or small, processed or unprocessed. A side benefit of this undertaking is that we now have a far more accurate and detailed sense of what we have, how much we have, and the processing status of each collection.

Delivery Mechanism:

We had many discussions over whether to deliver XML with a style sheet, convert XML to HTML on the fly, or provide static HTML. (We also explored the capabilities of ContentDM and EAD, with somewhat disappointing results; Archivists' Toolkit and Archon were not available when we embarked on this effort). In the end, for a variety of reasons, we chose to maintain EAD as our source document and for search purposes, but to generate and deliver static HTML.

Using saxon in conjunction with an XSLT style sheet (originally downloaded from the EAD cookbook but heavily revised in-house), we produce from the EAD both a full HTML version and a stripped-down printer-friendly HTML version. Our home page provides links to the HTML files, offering two browse options, two search options, and a listing by subject area. The search is done using SWISH-E (http://www.swish-e.org/), a free open-source XML indexing tool, for the EAD indexing in conjunction with HTML forms and php scripting.

While this search has proven surprisingly robust and effective, there are some limitations such as problems with diacritics and the effort required to add features such as highlighting of search terms in context. We are currently exploring the idea of moving our finding aids to an XTF platform.

Included with our EAD process is the creation or update of MARC records in our OPAC; thus, as of December 2010, all 2000+ of our collections also had MARC records so that researchers who begin with the catalog have a good chance of discovering our material. We use marcedit to produce MARC records from the EAD, which are then imported into our OPAC. We have had excellent results with this, though a small amount of manual editing (2-10 minutes) is still required for many files, particularly in adding/correcting subfields. This gives us a fully detailed MARC record complete with subject headings. The 856 field is populated with a link to the finding aid, so our collections and inventories are easily and immediately accessible from the OPAC.

In addition, our finding aids are harvested, indexed, and searchable via RLG's ArchiveGrid. Within the next twelve months we hope to have full search capability on our own site as well.

Encoding

Legacy finding aids (electronic): If a MARC record existed for the collection, a skeleton EAD template was generated from it using marcedit, a free product developed by Terry Reese at Oregon State; if not, we used XMetaL to create a skeleton EAD beginning with the pre-filled template and adding the necessary minimum information, including appropriate controlaccess elements. Jason Casden's excellent tri-XMLdate-normalizer.pl script saved us lots of time in inserting the normal attribute in date elements. Tagging of the inventory sections was done through a combination of search/replace in Word, strategic use of Excel, cut-and-paste, etc.

Legacy finding aids (paper): Text was OCR'd in-house, then the same process followed as for electronic.

Outsourcing: To jumpstart our effort we contracted out approximately 1500pp to a local company, Amcon Research, for OCR and EAD encoding to our specs, with very good results.

New finding aids: Created directly in EAD beginning with our template, using oXygen or XMetaL. We also had one tech-savvy student who used the EAD Cookbook Tools for NoteTab Pro. Any new collection immediately receives a collection-level EAD file and we use MarcEdit to generate a MARC record for it. Inventory sections are added as and when the collection is processed. Adhering to "more product, less process" means that if the collection is reasonably well organized (i.e. foldered and labeled) we do a simple box list and encode that as the inventory section, with a note that the collection has received minimal processing and that the inventory is a box list only.

Entities: We use entities for a few items such as the name of the institution, the name of the library, the link to our OPAC, the link to our subject listing of collections, and a few other items, so when/if those change we do not need to revise the EAD files.

All finding aids: XMetaL or Oxygen validates the document against the EAD 2002 DTD, and we run each file through RLG's EAD Report Card to ensure compliance with RLG Best Practices. We do a final visual quality check on the HTML output. Where applicable, we identify related collections and use extref elements within relatedmaterial to add links between them. For large inventories, we regularly use Jason Casden's tri-XMLdate-normalizer.pl script to insert the @normal attribute for unitdate elements.

Contact:

Michele Combs, mrrothen@syr.edu, Syracuse University, Syracuse NY.

RLG Member:

Yes

Last updated: 2011-07-14

Update information:
If any information concerning the above EAD implementation is incorrect or out of date download the XML source file for this entry, make required changes and mail back to Mark Matienzo. Updated entries may only be submitted by the contact listed above.