Tools & helper files Syracuse University Special Collections Research Center


URL: http://library.syr.edu/information/spcollections/findingaids/index.html

Description:  

We have 12,000+ pages of finding aids covering 800+ collections, most existing only in hard copy. Though we would like to engage in a comprehensive legacy conversion effort and are exploring funding avenues, at present we are selectively identifying collections or sets of related collections and taking it in small chunks. Some of the finding aids require incorporaton of annotations, merging of two or more separate accessions, or other manual intervention as part of the conversion process.

We began with a test set of 20 finding aids and hammered out many of the bugs both with encoding and XSLT conversion using that data set, though we continue to occasionally encounter oddities. This gave us two very useful results: the first was detailed, documented tagging specifications, most of which are embodied in the second, a skeleton EAD template that is pre-loaded with much of the standard coding such as institution name, encodinganalog attributes, etc.

To date we have nearly 300 inventories completed.

Encoding Procedure:  

Legacy finding aids (electronic): If a MARC record exists for the collection, a skeleton EAD template is generated from it using marcedit, a free product developed by Terry Reese at Oregon State; if not, we use XMetaL to create a skeleton EAD beginning with the pre-filled template and generate the necessary information, including appropriate controlaccess elements. Tagging of inventory is done through a combination of search/replace in Word, strategic use of Excel, cut-and-paste, etc. We are also, where applicable, identifying related collections and using extref elements within relatedmaterial to add links between them. XMetaL validates the document against the EAD 2002 DTD; we also run it through RLG's EAD Report Card. A final QA is done on the HTML output.

Legacy finding aids (paper): Text is OCR'd in-house, then same process followed as for electronic. See also outsourcing below.

New finding aids: Created directly in EAD using our template.

Entities: We use entities for a few items such as the name of the institution, the name of the library, the link to our OPAC, the link to our subject listing of collections, and a few other items, so when/if those change we will not need to revise the EAD files.

Outsourcing: We have contracted out one set of approximately 850pp to a local company, Amcon Research, for OCR and tagging to our specs, with very good results, and have identified another 550pp for them to do.

Delivery Mechanism:  

We had many discussions over whether to deliver XML with a style sheet, convert XML to HTML on the fly, or provide static HTML. (We also explored the capabilities of ContentDM and EAD, with somewhat disappointing results.) In the end, for a variety of reasons, we chose to maintain EAD as our source document but generate and deliver static HTML.

Using saxon in conjunction with an XSLT style sheet (originally downloaded from the EAD cookbook but heavily revised in-house), we produce from the EAD both a full HTML version and a stripped-down printer-friendly HTML version. Our home page provides links to the HTML files as they are made available via a list, subdivided by subject areas.

For those collections that do not have a MARC record we use marcedit to produce MARC records from the EAD, which are then imported into our OPAC. We have had excellent results with this, though minimal manual editing (2-10 minutes) is still required for many files. This gives us a fully detailed MARC record complete with subject headings. For both new and existing MARC records, the 856 field is populated with a link to the finding aid, so our collections and inventories are also accessible from the OPAC.

At present our finding aids are not searchable on our site. As of August 2006, they will be harvested, indexed, and searchable via RLG's ArchiveGrid. Within the next twelve months we hope to have full search capability on our own site as well. We are investigating using SWISH-E (http://www.swish-e.org/), a free open-source XML indexing tool, in conjunction with HTML forms and PERL, as well as other options.

Contact:  

Michele Combs, mrrothen@syr.edu, Syracuse University, Syracuse NY.

RLG Member:  

Yes

Last updated:  2007-04-17

Update information:
If any information concerning the above EAD implementation is incorrect or out of date download the XML source file for this entry, make required changes and mail back to levjen@umd.edu. Updated entries may only be submitted by the contact listed above.