In the first meeting of the EAD Roundtable, an open discussion was held to determine what the focus of the group should be. While the Roundtable saw fit to take on a number of topics, a repeated request was for information regarding the creation and distribution of EAD instances -- "How do you do it at your institution?" To this end, Members of the EAD RoundTable are in the process of gathering this information anecdotally and creating a list (parallel to that at the Library of Congress), which provides insight about the most often-asked questions. Each site on the LC list was contacted with a letter which asked about the following:
Delivery method of encoded finding aids (e.g. native SGML, HTML on the fly, or other specific method)
Main contact person for inquiries
Whether or not they are participating in RLG's Archival Resources project.
DISCLAIMER: Anecdotal commentary contained on this page about commercial software is in no way meant as endorsement of these products.
To have your site listed, contact Stephen Yearl.
CURRENT IMPLEMENTORS OF THE EAD DTD
American Institute of Physics, Niels Bohr Library
Delivery method:
The finding aids are statically converted from XML-EAD to a XHTML 1.0 frameset containing RDF metadata via XSL Transformations. The individual documents within the frameset are written in validated HTML 4.0 and rely heavily on Cascading Style Sheets. The XML instances are indexed using the Verity search engine. Users can form Boolean and proximity searches across specific elements of the finding aids in a guided keyword interface.
Encoding procedure:
In September 1999 the AIP History Center began work on a one-year grant project funded by the National Endowment for the Humanities. The Physics History Finding Aids site is a subject based consortium; thirteen repositories (American Institute of Physics, California Institute of Technology, Harvard University, Library of Congress, Massachusetts Institute of Technology, Northwestern University, Rice University, University of Alaska-Fairbanks, University of Chicago, University of Illinois at Urbana-Champaign, University of Iowa, University of Texas at Austin, and Woods Hole Oceanographic Institute) have contributed finding aids to selected collections in the physical sciences. The Physics History Finding Aids site now has over 100 finding aids searchable on the web. The encoded finding aids will be linked to their equivalent MARC catalog records in the History Center's web-based International Catalog of Sources for the History of Physics and Allied Sciences (ICOS). The project will continue to grow through the cooperation of repositories willing to utilize a distributed custody approach.
AIP brings all electronic formats into ASCII and splits the text into separate files based on the descriptive information and the container list. The descriptive information is cut-and-pasted into a template in the authoring software (NoteTab Pro). The container list is manipulated and tagged using C++ scripts, Perl scripts, and search-replace routines. After joining the two parts, the finding aid is validated using NSGMLS (SP). The SGML is converted into XML by cutting and pasting the XML headers into the document. AIP uses XT as its processing engine. FOP and WordPerfect are used for PDF formatting and printing.
Main contact persons:
Clay Redding
credding@aip.org
(301) 209-3172
Joe Anderson
rja@aip.org
(301) 209-3183
RLG Archival Resources participant?:
Yes.
American Philosophical Society
General Notes and Overview of PACSCL - Philadelphia Area Consortium of Special Collections Libraries:
The APS is one of a number of institutions that comprise PACSCL. Fifteen of
these institutions are part of a Delmas-funded legacy encoding project to
outsource about 1,500 pages of paper per institution for conversion into
EAD. Although serving these files was not an explicit part of the project,
we've taken a step in that direction by writing stylesheets and developing a
protocol for editing the outsourced documents and providing a means for
future encoding. The stylesheets developed (three interconnected
ones -- two in xsl (one written to the xsl:working draft specs so that it
can be viewed using IE5+, the other written to the more powerful version 1.0
specs) and one in css -- have been adapted and/or adopted by the Historical
Society of Pennsylvania, Temple Univ., Haverford, Bryn Mawr,
Swarthmore Friends Historical Library, Swarthmore Peace Collection, the
Library Company of Philadelphia, the Hagley Museum, the Presbyterian
Historical Society, and maybe others as well. The other institutions in the
grant (Winterthur, U. Penn, Phil. Museum of Art, the Wagner Free Institute
of Science, the Academy of Natural Sciences) are in various places with
regard to their implementation of EAD. (Wagner has dropped out of the
project, PMA is on hold with another encoding grant in hand, and the others
have received their files, but sit in different positions with respect to
getting their files set to post).
Delivery method:
EAD files are made available as a separate alphabetical list
(
http://www.amphilsoc.org/library/eadfiles.htm) and as links from our MOLE
guide (Manuscripts On-Line:
http://www.amphilsoc.org/library/browser/),
which is an alphabetical listing of abstracts to all manuscript collections
at the APS. In the former, explicit links are offered to both xml versions and
html; in MOLE, a simple java browser-sniffer has tentatively (but not necessarily permanently)
been installed to direct users to the xml or html
version, as appropriate. Explicit (non-java
dependent) links to the native xml and html files in MOLE will probably be placed some time soon.
Links we will be provided to the EAD files from our OPAC when time allows.
Encoding procedure:
At the APS, the process usually begins with paper inventories, which were outsourced to Apex for conversion to rough EAD courtesy of the Delmas grant. Subsequently, some work has been done with a few finding aids that were available in html, some paper finding aids have been scanned into MSWord, and some have been rekeyed. In every case, though, it was necessary to provide a rather significant degree of augmentation to the records to make them even minimally acceptable as finding aids. Old descriptive practices did not include many of the most basic elements that are essential to a proper finding aid.
Other than the out-sourced files, finding aids are marked up in MSWord using a plain-text template and edited as need be, again in plain-text. Some experimentation has been done with extracting data from an MSAccess database directly into an EAD plain-text template using the printmerge function. For container listings (which tend to be stored in different databases), it's not quite as simple, but works adequately so far, at least on a case by case basis.
Files are proofed largely by attaching an IE-compatible stylesheet and viewing through the website. Once proofed for content, they are validated in XMetaL and transformed to html using Saxon using the Version-1.0 stylesheet.
Main contact person:
Rob Cox
American Philosophical Society
105 South Fifth Street
Philadelphia, PA 19106-3386
215.440.3409
rscox@amphilsoc.org
RLG Archival Resources participant?:
Not currently, without ruling out possibility for doing so in the future.
Archives of American Art, Smithsonian Institution
LAST UPDATED: NOVEMBER 2001
Delivery method:
The Archives of American Art, Smithsonian Institution provides HTML documents listed on a Finding Aids web page on our website. We provide full text searching through an Atomz search engine on our website that can be restricted to searching only the finding aids. We also contribute our SGML finding aids to the Research Libraries Group's (RLG) Archival Resources database, and hope to deliver the SGML versions from our own web site at some point, perhaps in collaboration with other Smithsonian archival units
Encoding procedure:
AAA received funding from RLG and from the Smithsonian Institution to pay for Apex Data Services to convert approximately fifty of our finding aids to EAD. We use a combination of Notetab Pro and XMetal to mark up new finding aids and modify those that were converted by Apex . Initially, RLG provided us with a Perl script to convert them from SGML to HTML for display on our website. We are now adapting the markup process taught by Daniel Pitti at the EAD training at the Rare Books School at the University of Virginia to conform to our workflow and presentation preferences. This method uses NoteTab and XSL scripts, James Clark's programs to parse SGML and convert to XML, and Saxon to convert to HTML. Our plan includes replacing HTML finding aids with XSL generated ones. One archivist on the staff has primary responsibility for encoding older finding aids and in managing the Apex conversion related work, and a computer specialist on staff provides the technical support for the software, stylesheets and other programming. Future plans call for incorporating the encoding into the newly created finding aids.
Main contact person:
Karen Weiss
Catalog and Internet Resources Mgr.
Archives of American Art
Smithsonian Institution
tel: 202-275-1880
weissk@aaa.si.edu
RLG Archival Resources participant?:
Yes
LAST UPDATED: NOVEMBER 2001
Delivery method:
XML finding aids are delivered to XML capable browsers (IE5) using an XSL stylesheet to control display. For non-XML capable browsers, the server converts XML and delivers a HTML version on-the-fly. Access to finding aids is currently limited to internal access via the Intranet and Collection Management System, with limited public access via paper copies of the finding aids in the reading room. By the end of the year there will be public access via a new collection access system opac on our website.
Encoding procedure:
Legacy formats are mostly Microsoft Word or printed and there is little standardization of structure and content. Encoding is done in-house, using XMetal in conjunction with a template created using a CSS stylesheet (to give a basic encoding structure) to guide staff in the markup of finding aids using XMetal. Legacy finding aids are re-authored to bring them in-line with the new template and re-keyed or cut/pasted into the template. New finding aids are authored directly into XMetal. XMetal validates the finding aids against the EAD DTD each time it is saved and options for indexing and searching are currently being explored.
Main contact person:
Carmel McInerny
Published and Digitised Collections
Australian War Memorial
GPO Box 345
Canberra ACT 2601
carmel.mcinerny@awm.gov.au
RLG Archival Resources participant?:
No
LAST UPDATED: October 2004
Delivery method:
Delivery through our Web page in HTML files, transformed by a stylesheet adapted from the Cookbook's from the original XML files.
Encoding procedure:
We import our Word, Excel and Access files into the application CALM, by means of import format files, and export them as EAD files.
Main contact person:
Jos� Mariz
Arquivo Distrital de Lisboa
(Instituto dos Arquivos Nacionais)
mariz@iantt.pt
RLG Archival Resources participant?:
No
Bodleian Library, Department of Special Collections and Western Manuscripts
Delivery method:
Currently our SGML files are converted to XML and then processed using XT to produce HTML files. When more browsers begin offering native XML/XSL support it is anticipated that this will be done on the fly.
Encoding procedure:
Our finding aids exist in a variety of formats - paper only, word-processed documents and a variety of databases. We currently have a grant to convert some of our print-format finding aids by using a data conversion service. In house, our word-processed finding aids are cut-and-pasted into an EAD template file. We have developed numerous Perl scripts to automate creation of IDs and REF TARGETs, LEVEL attributes etc., as well as to perform a variety of consistency checks. A pilot project showed the feasibility of converting databases directly into EAD, again using Perl. As well as validating using AuthorEditor/XMetaL, we use NSGMLS. We originally were using OpenText (Pat5.0) to index our finding aids but recently we have been experimenting with SGREP.
Main contact person:
Mike Webb
Assistant Librarian
Department of Special Collections and Western Manuscripts
Bodleian Library
Broad Street
Oxford
OX1 3GB
United Kingdom
Tel; (01865) 277164
mnw@bodley.ox.ac.uk
RLG Archival Resources participant?:
Yes
Delivery method:
We provide them in both SGML and HTML
Encoding procedure:
We have marked up finding aids from a variety of formats: Word and
WordPerfect documents, our Inmagic database exported in ASCII text,
and SGML files created from scratch. We've tried different ways to
automate the procedure but have yet to find one best way. We do use a PERL
program created by Alvin Pollock to create an HTML version of our SGML
finding aid, and it's a wonderful program. We currently don't have a search
engine or other indexing retrieval method but are marking up our finding
aids with the future in mind.
We use Panorama's Author/Editor for markup, offer Panorama Free and the HTML versions for delivery methods. We like Author/Editor for the most part.
Main contact person:
Susan Pyzynski
Cataloger/Systems Liaison, Brandeis University Libraries
pyzynski@brandeis.edu
RLG Archival Resources participant?:
Yes
California State University, Dominguez Hills
See: University of California, Berkeley
Cartoon Research Library, Ohio State University
Project description:
The Cartoon Research Library at Ohio State University is creating a finding aid to the Newspaper Comics subgroup of the San Francisco Academy of Comic Art collection. This part of the collection is estimated to contain some 2.5 million newspaper comic clippings, tearsheets, and Sunday sections, spanning the years 1894 to 1996.
The finding aid is being authored in NoteTab, using Chris Prom's EAD files. Default tags are being customized through the use of NoteTab's clip libraries, in order to accomodate the structure of this particular document.
XML to HTML transformation is currently accomplished through XT, built in to the EAD NoteTab setup. OSU Libraries Information Technology department is experimenting with Cocoon for various projects; we are considering trying it out on the Cartoon Research Library finding aid in the future.
The current work on the finding aid is largely conceptual. The Newspaper Clippings collection is to be divided into two series: Comic Sections and Comic Clippings. I have created a tagging structure that can encompass these two rather different series. Both are organized first by title, then by date. "Title" means something different in each series: in the first, it is the title of the newspaper in which the comic section appeared, with description at the item level. In the second, it is the title of the comic strip, with description at the box level.
Problems of additional description also present themselves. Ideally, the Sunday sections would receive detailed description at an additional component level. Early Sunday sections, from the late 1890's to the early 1920's, were not so standardized as they are now. Comic features came and went, their titles changed weekly, and important comic artists did one-time features. Additional <c> wrappers with cartoon titles, and <persname> attribution, would provide a much-needed index of the early work of American comic artists--if time allows for this much encoding.
The problem of tracking dates for comic strip clippings, without making the finding aid too large, has been solved through the use of Excel spreadsheets, which contain a month-and-date grid of holdings for each comic feature. These will be converted to PDF and will reside on the server with the finding aid, accessible through <extref> links.
A representative image of each comic feature, consisting of a single cartoon panel, will be available as a <dao>.
Main contact person:
Amy McCrory
Project Archivist
mccrory.7@osu.edu
Delivery method:
Our EAD finding aids are converted through a XSLT stylesheet based on James Clark's XT program into HTML and as HTML the finding aids are uploaded to the server in five different directories, separately for each partner institution. The original XML files of the finding aids are stored locally as master files. Note Tab Pro is used for parsing XML files.
The printed versions of the HTML files are usually used in the reading room. The Center for Jewish History partners intend to load all EAD finding aids in XML format into DigiTool and thus provide users with a single search across all of them.
Finding aids are also accessible via RLG's Archival Resources.
Encoding procedure:
Most of the collections, which have been encoded at the Center for Jewish History so far, were legacy finding aids; some of them existed only in hand-written or typewritten form. These were either keyed directly into XML in XMetal 2.0 or in to MS Word when substantial descriptive information had to be added. Some of the finding aids reside in WordPerfect or InMagic databases. These files were usually converted into MS Word where the descriptions and data formats were polished and modified, so they would comply with APPM, LC NAF, LC-ALA transliteration tables, and other standards. Information in the descriptive parts of the finding aids is pasted into XML templates. Container list data is converted into XML through a report template in MS Access that appends EAD tags to tabbed container list data in an imported TXT file. The output is then pasted into XML file in XMetal 2.0.
Some of the newer finding aids are written directly into EAD template in XMetal 2.0 and Altova XMLSpy 2005 Home edition.
RLG Archival Resources participant?:
Yes
Main contact person:
Robert Sink
Senior Archivist and Project Director
Center for Jewish History
15W 16th Street
New York, NY 10011
(917) 606-8215
bsink@cjh.org
Colorado State University Water Resources Archive
Delivery method:
The XML files are converted to HTML by using an XSL stylesheet and then are placed on a server. The site gives direct links to the finding aids and provides a search engine (Mnogo) to search across them. Finding aids were posted to the web beginning in April 2003. Printed finding aids are also produced through an XSL stylesheet, conversion to Microsoft Word, and a bit of manual clean up.
Encoding procedure:
Existing word processing files were manually cut-and-pasted into a template in XMetaL. The archivist trained a student and a staff member to do this with good results. New finding aids are created directly in XML.
RLG Archival Resources participant?:
Yes
Main contact person:
Patty Rettig
Project Archivist
prettig@manta.colostate.edu
Columbia University, Rare Book and Manuscript Library
Delivery method:
We deliver the documents in native SGML and HTML. Both are hard coded.
We deliver the EAD via Panorama Free and RLG's Archival Resources project.
Encoding procedure:
Direct input by processor into a database or creation of a Word document.
We do some OCR. If we start with a Word document we import the data
into the database by supplying the appropriate number of tabs (fourteen at this point).
Much of the insertion of the tab stops is accomplished using Find/Replace;
however, we do need to watch what we are doing.
Once the data is in the database we can further edit using the editing features of the data base. The database is configured so that all fields are optional and the output will still parse. The database we use is Pro Cite, and the markup is an output style applied to each entry in the database. We also use the Berkeley Template to encode all of the higher level information. The template gives us a text file into which we insert the text file output from Pro Cite.
We validate the document with ParserPlus, the Windows-based version of Jim Clark's SP produced by CSW Informatics.
Main contact person:
Patrick T. Lawlor, Curator
The Herbert H. Lehman Suite and Papers, Columbia University
lawlor@columbia.edu
RLG Archival Resources participant?:
Yes
College of Staten Island, CUNY
Delivery method:
We currently use an XSLT stylesheet to convert XML into HTML for Internet display
purposes.
Encoding Procedure:
All finding aids existed in Microsoft Word, although most were also available as
HTML files. Containing listings for many collections also existed in InMagic's DB/Textworks as
databases.
We use the Home Version of XMLSpy for EAD markup, stylesheets, and file validation. A basic template was created in XMLSpy, and the text of the finding aid was cut and pasted into the appropriate portions of the template. Additional editing was completed in XMLSpy as necessary. As InMagic's DB/Textworks will export data in XML, we were able to automate the process of converting the container lists into EAD by using XSLT to transform the InMagic XML structure into an EAD XML structure.
The finding aids are subject-indexed, but there are currently no MARC records available for our manuscript collections in the library catalog. The finding aids are accessible from collections lists and abstracts available on our web site.
Main contact person:
Catherine Carson
Archives & Special Collections
College of Staten Island, CUNY
Library, 1L-216
2800 Victory Boulevard
Staten Island, New York 10314
archives@mail.csi.cuny.edu
RLG Archival Resources participant?:
No
Cornell University Including:
Cornell Institute for Digital Collections
Cornell University Library Rare & Manuscript Collections
Delivery method:
Finding aids are delivered in XML to XML capable browsers (currently, only
Internet Explorer 5). Bringing together the XML document and the supplied
XSL style sheet, the client's IE5 displays properly formatted native XML.
For non-XML capable browsers, an HTML version is delivered. This version is
created on-the-fly by components of IE5 on the server, without the need for
the usual SGML-to-HTML scripts.
Encoding procedure:
After considerable discussion and time spent developing an acceptable EAD
markup template for Cornell's Division of Rare and Manuscript Collections,
we have concentrated on the direct delivery and navigation of XML finding
aids and on the development of a scalable XML system that can be easily
maintained. We're also working to perfect high-quality print output from
the XML instances. Our goal is to create XML masters and produce all
derivatives (print, web, etc.) from these.
For test material, input thus far has been from word processing files (MS Word) and HTML versions. We're using various free conversion tools: Emacs (text editor), PSGML (SGML add-on to Emacs), PS (J.Clark's SGML parser+ toolkit), and Perl. These same tools will likely be used in any large scale retrospective conversion of finding aids in electronic format. For existing paper guides, we may try outsourced re-keying, with some added codes (but not full markup, which would be done in-house). For current finding aid production, we are experimenting with strictly enforced Word templates and styles, combined with Perl scripts for conversion. As we eventually want to treat the XML files as our masters and get away from this two-step procedure, we're also hoping that more affordable and user-friendly XML authoring solutions are developed.
We're beginning to explore alternative methods of indexing and searching our on-line guides.
Main contact person:
David Ruddy
dwr4@cornell.edu
RLG Archival Resources participant?:
Yes
Cuban Heritage Digital Collection, University of Miami
LAST UPDATED: NOVEMBER 2001
Delivery method: and Encoding procedure:
Main contact person:
Maria R. Estorino
Project Director/Archivist
mestorino@miami.edu
Denver Public Library (Western History / Genealogy)
LAST UPDATED: NOVEMBER 2004
Delivery method:
XML finding aids are delivered to XML capable browsers (IE5) using an XSL style sheet to control display. For non-XML capable browsers, the server converts XML and delivers a HTML version on the fly. Access to online Manuscript Finding Aids is through the Internet. Information on all of the manuscript collections is also available through the Denver Public Library Catalog.
Encoding procedure:
The Denver Public Library received partial funding from the National Endowment of the Humanities for encoding the legacy finding aids. Legacy formats are Microsoft Word with a standard structure and content. Encoding is done in-house, using XMetal in conjunction with a template created using a XSLT style sheet (to give a basic encoding structure) to guide staff in the markup of finding aids using XMetal. Legacy finding aids are converted and/or cut/pasted into the template. New finding aids authored directly into XMetal. XMetal validates the finding aids against the EAD DTD each time it is saved. Additional options for indexing and searching are currently being explored. The encoded finding aids are being publishing on the web using PLEADE enabling users to browse and then read the finding aids online. PLEADE is a highly configurable Web publication framework, including a search engine, for Encoded Archival Description documents. The appearance of the online finding aids is similar to the model provided by SGML and XML implementations, with a navigator that shows the structure of the document and functions as a hypertext table of contents to the file and a search engine.
Main contact person:
Ellen Zazzarino
Senior Archivist
Denver Public Library
Western History/Genealogy Department
720-865-1905
Senior Archivist
ezazzar@denver.lib.co.us
RLG Archival Resources participant?:
No
Duke University, Rare Book, Manuscript, and Special Collections Library
LAST UPDATED: NOVEMBER 2001
Delivery method:
EAD/XML finding aids are rendered into HTML on the fly using DynaWeb from Enigma, and as such are usable in any web browser. DynaWeb also provides sophisticated indexing and searching capabilities. EAD/XML finding aids are also available to users with XML/XSLT compatible web browsers, though this method does not allow for searching across finding aids or tag level searching within them.
Encoding procedure:
With an initial phase of retrospective conversion of existing finding aids complete, Duke is now concentrating on two fronts. The first seeks to refine the process of creating new finding aids using XMetaL, an XML authoring software by Softquad, and web forms. The second involves participating in the North Carolina Encoded Archival Description (NCEAD) consortium to standardize EAD encoding practice.
Duke is integrating the writing and encoding process for new finding aids using a combination of web-forms, XMetaL, and macros. The web form is used to enter information that is processed into the header and collection level description sections of the finding aid. The container list is created using XMetaL in combination with locally developed macros to insert and surround text with EAD elements and attributes. The EAD instance is validated in XMetaL and checked for encoding conformant to Duke Guidelines and the NCEAD Application Guidelines (NCEAD AG). Additionally, the EAD instances are proofread and checked for accuracy of display in web browsers as well as in the DynaWeb system.
The process of writing and encoding finding aids is performed by both students and professional staff. Student responsibilities include processing collections and writing container lists. Staff members complete the finding aids with header and collection level information and bibliographic records, as well as performing other quality control functions.
All finding aids are encoded according to a set of guidelines written by the members of the North Carolina Encoded Archival Description (NCEAD) consortia. At present, this consortia includes North Carolina State University, University of North Carolina at Chapel Hill, State Archives of North Carolina, and Duke University. The NCEAD AG seeks to standardize EAD encoding at the four institutions in order to present users with a set of consistently encoded finding aids.
For more information about Duke EAD see http://scriptorium.lib.duke.edu/findaid/ead
For more information about NCEAD see Project Description
Main contact person:
Joshua McKim, Digital Encoding Archivist
Rare Book, Manuscript, and Special Collections Library, Duke University
joshua.mckim@duke.edu
RLG Archival Resources participant?:
All encoded finding aids are made available to the RLG project.
Durham University Library
[Also incorporated into the Archives Hub, United Kingdom]
Delivery method:
Locally (via NFS) Dynatext. Elsewhere HTML on the fly via Dynaweb, SGML (using
Panorama stylesheets, via the Dynaweb WWW server), PostScript via Dynaweb's
printout generating facility. Planning to use Cheshire II when Z39.50 becomes a
viable option and could also distribute Adobe Acrobat files if useful.
Encoding procedure:
Durham's handlists tend to be highly detailed item level listings, done without
any authority system or house style, only the most recent of which had been word
processed. This has precluded most automation of the markup process, although
awk (a program that comes with UNIX (and LINUX) which treats text files like
a database) has been useful for some sequences. The results are highly unsatisfactory
as electronic texts, stretch to (and beyond) the limits of EAD, and do not
display the consistency that might be desired from using EAD, but nevertheless
provide what was required of the project concerned (making existing lists
available online) and establishes the skeleton which much future authority work
can flesh out.
Conversion has started with a basic keying of the text, then a combined proof reading and markup using WordPerfect SGML with macros on ASCII text which is saved and parsed as SGML. While fine for conversion, WordPerfect does not deal reliably with finished EAD SGML files, so these are at present worked on using EMACS in PSGML mode with the NSGMLS parser, although development is currently underway on using Adobe Framemaker+SGML as the means of editing existing documents. As mentioned, the lack of authority forms, etc., has meant leaving aside questions of indexing, etc., at present (the problem lying with the data rather than EAD). The handlists are made available via Dynatext/Dynaweb etc, as indicated above (http://flambard.dur.ac.uk:6336/)
Main contact person:
Richard Higgins
r.i.higgins@durham.ac.uk
RLG Archival Resources participant?:
Yes
Emory University Special Collections
Delivery method:
Special Collections has participated in two collaborative grant projects. The finding aids encoded as part of the Georgia Archives and Manuscripts autoMated Access (GAMMA) Project are delivered in SGML. The encoded finding aids are linked to the bibliographic descriptions of the collections, created during an earlier phase of the project, in the RLIN (Research Libraries Information Network) national database. A list of the GAMMA EAD finding aids is also available through the GAMMA website. The GAMMA records have been made available through a file server at Emory on a temporary basis. Plans are underway for GALILEO (Georgia Library Learning Online) to host these and future finding aids encoded using the EAD DTD.
Finding aids encoded for the Selected Archives of Georgia Tech and Emory (SAGE) Project are delivered in HTML. Emory's SAGE site uses the Isite search engine to search the SGML version of the finding aids and then deliver the HTML version. Hyperlinks in the HTML files allow the user to link from a folder title to the digital surrogates of the items in the folder.
Encoding procedure:
In 1997, GAMMA rekeyed approximately thirty-five paper finding aids from 17 different repositories (including Emory) and encoded them in the beta version of EAD using Author/Editor.
The SAGE Project (1997-2000) will develop a model digital archive. As a part of the project, we will rekey one finding aid and will encode two others from scratch. Initially, we used Author/Editor, but we have decided to switch to XMetaL. We create an HTML file from the SGML file by hand. Hyperlinks are added to the HTML file that allow the user to link to contents of the file, if they have been digitized. (This process hasn't been automated yet either.) The finding aids currently available through the site are in EAD beta, but we hope to convert them to EAD version 1 during the coming year.
Special Collections also participated in RLG-Apex Finding Aids Conversion Service, but has not yet made those files available.
Main contact persons:
GAMMA Project & RLG-Apex Conversion Service: Susan Potts McDonald
libspm@emory.edu
SAGE Project: Naomi Nelson
libnn@emory.edu
RLG Archival Resources participant?:
Subscriber, but not yet a participant. We plan to add our records this year.
Delivery method:
Two methods: finding aids are listed by repository on the opening screen of the site, and can be
viewed there in SGML using Panorama or in HTML generated "on the fly".
Finding aids are also retrievable via search, and delivered in HTML "on the fly."
Encoding procedure:
As a project involving eight Harvard repositories, the procedure
used for EAD markup is slightly different at each, although all follow the
established Harvard guidelines (available at:
http://findingaids.harvard.edu). The finding aids now in EAD are
primarily ones first created using various word-processor packages,
although a few have been created "from scratch" using EAD templates
created by the individual repository. Some archives have used macros to
automate some markup.
Two SGML authoring packages are in use: Author/Editor (University Archives, Peabody, Baker) and WordPerfect 7 & 8 (Houghton, Andover-Harvard, Law, Botany). Schlesinger uses a Word template and validates the finding aids using WordPerfect.
Indexing is done using the LiveLink search engine from OpenText, and the indexes made available for searchiing via the web using locally developed software (derived from a program developed at the University of Michigan).
Main contact person: Individual project members are listed at our web site.
RLG Archival Resources participant?:
Yes
Historic Pittsburgh Finding Aids project
Delivery method:
The Finding Aids collection is indexed using Open Text's pat50 SGML aware search engine. The user may search the full text of the findings by using a query form on the Web or browse a list of all finding aids available. Queries are sent to a CGI script which retrieves the information from the database using pat50 and then processes the results on-the-fly to display them in HTML. There is no accommodation for displaying native SGML at this point.
Encoding procedure:
Most of the finding aids from both repositories are in some electronic form. Long container lists are partially encoded using a perl script. The remainder of the finding aid is put into a standard template. Most finding aids are currently being prepared by students from the University of Pittsburgh School of Information Sciences. We use Emacs with psgml and Jade tools on our NT workstations.
We are currently in the final phase of testing a web-based data entry form for collection level encoding so that those unfamiliar with EAD can automatically encode new finding aids.
Main contact person:
Elizabeth Shaw
ejshaw@pitt.edu
RLG Archival Resources participant?:
No
International Institute of Social History
Last updated: February 2003
online at: http://www.iisg.nl/archives/findingaids.html
Delivery method:
Currently over 750 finding aids of archival collections at the IISH are available on the Web. Most of them have been encoded in EAD and are delivered in both SGML and HTML. The HTML version is created from SGML, not on the fly, but using a Perl-script. Besides to an alphabetical list the finding aids are linked to their MARC records and collection level descriptions. Full text searching through an search engine (WebGlimpse) is also possible. Presently the IISH works on delivering the finding aids in XML.
Encoding procedure:
We have started with converting some 500 WordPerfect 5.1 files. These files were first prepared for conversion by checking for consistency and adding extra information and codes required. Conversion to EAD is done automatically with OmniMark. EAD files were (and still are) validated with Author/Editor. After the WordPerfect 5.1 files we have also converted our paper finding aids to digital format using EAD. We didn=t do OCR, instead all documents were rekeyed en encoded according to EAD. At present, we try to develop a suitable procedure to mark up the new finding aids.
Information on archives and collections can also be found in an online catalogue (GEAC-geoweb) and on separate webpages on the archives. At both places links to the finding aids will be made.
Main contact person:
Jack Hofman
Archivist
International Institute of Social History
jho@iisg.nl
RLG Archival Resources participant?:
Yes
Instituto dos Arquivos Nacionais/Torre do Tombo, Portugal
LAST UPDATED: NOVEMBER 2001
Delivery method:
We begin with the delivery of fonds level descriptions on the Web in HTML
files, transformed by a stylesheet adapted from the Cookbook's from the
original XML files.
Encoding procedure:
We use the original MsWord files, a template in the XMLSpy (in the future
probably XMetal 2.0) software and, for the time being, the manual process of
copy and paste into the template. We will soon try to automate the markup somehow by using EAD Stylus or some
other suitable way of tagging. We will try to develop some templates that
make tagging easier and more error-free.
Main contact person:
Jose Mariz
Gabinete de Estudos e Planeamento Tecnico
Instituto dos Arquivos Nacionais/Torre do Tombo
mariz@iantt.pt
RLG Archival Resources participant?:
No response
Iowa Women's Archives, University of Iowa Libraries
Delivery method:
We are posting the finding aids on the web in native SGML. Our original
plan was to mark up each finding aid in both HTML and SGML. But while we
had a small HRDP grant over the summer, we concentrated on SGML,
so not all of our SGML finding aids have HTML counterparts. Ultimately we
would like to be able to convert to HTML on the fly.
Encoding procedure:
All of our finding aids are in electronic form in Microsoft Word and are
created using a template in WORD7. These finding aids are then marked up in
SGML using Author/Editor. (When we do HTML finding aids, we use Claris Home
Page, and follow the same procedure.)
We have created macros in A/E that automatically insert the correct tags for genre, corpname, persname, geoname, title, etc. We also have macros that insert the combination of tags for a new box list and a new series level, and signify if it is a series, folder, or file.
The same sort of thing was done in CLARIS, where we created libraries that insert " " multiple times to display the box list at a hierarchical level. So far we have not been marking dates, but have been marking all of the above mentioned genre terms as they appear in the Biography/History, Scope and Content, and Box list. The subject terms we are taking from our library catalog so they are LC subject headings.
Main contact person:
Robert J. Jett
robert-jett@uiowa.edu
RLG Archival Resources participant?:
We are contributing our finding aids, but are not currently subscribing.
Johns Hopkins Medical Institutions, Chesney Medical Archives
Delivery method:
The findings aids are available in both HTML and SGML. We maintain 2 versions of each finding aid (which hasn't been a problem so far).
Encoding procedure:
Rather than converting existing finding aids, our focus has been to get some standard information about all of our collections onto our Web site, and EAD seemed the most effective way to accomplish this. Thus we are undertaking an "EAD-lite" project, and using only a small subset of the codes. Each EAD entry presents brief collection information, a biographical note, a scope and content note, some administrative information, and (in most cases) a portrait of the individual.
In 1997, Scott Leonard, a graduate student from the University of Maryland College of Library and Information Services, as part of an internship, developed the EAD site. He did the technical groundwork, did the initial coding, and worked out most of the "bugs" in the system (of which there were many). He also encoded the first batch of finding aids and wrote detailed instructions for encoders who would follow him. Since then, students, part-time workers, and one professional have done the encoding. The project coordinator checks the encoding on each entry; several members of the Archives staff edit the entries for content and style. SGML markup was done using SoftQuad's Author/Editor program, and the SGML finding guides display in Panorama. (We offer the HTML version as the "default" option, as we don't expect most of our users to have Panorama.)
The next phase will be to add folder listings. For these, we will be working with existing finding aids in a variety of formats: printed guides, typed inventories, and WordPerfect documents. We contracted with ArchProteus conversion service to convert a very large inventory which heretofore only existed as a published volume. They produced both an HTML and an SGML/EAD version, both of which will soon be available on our Web site. We plan to contract with ArchProteus to convert other inventories.
Main contact person:
Lisa A. Mix
lmix@jhmi.edu
telephone: 410-955-3043
RLG Archival Resources participant?:
Pending
Library of Congress Finding Aids Project
Last updated: January 2005
Delivery method:
Delivery method of encoded finding aids
Delivered in HTML (framed and unframed versions) and in PDF derived from the EAD instance. SGML finding aids in process of being converted to XML using toolkit created by Mike Ferrando and available at http://lcweb2.loc.gov/music/eadmusic/eadconv12/ead2002_r.htm. Preliminary HTML versions and PDF derived from XSL stylesheet in LC conversion toolkit. An overview of the XSLT transformation is available at http://lcweb2.loc.gov/music/eadmusic/eadconv12/ead_xsl_overview.htm.
Encoding procedure:
Finding aids in various divisions within the Library created originally as word processed documents (primarily WordPerfect) or Access databases. Word processed files usually converted using templates and styles, although some are mapped to generic XML and thence to EAD DTD using XSLT. Access databases converted using XSLT.
Indexing procedure: finding aid database still indexed using InQuery, used for other full-text applications at LC, pending development and acceptance of XQuery language by the W3C.
Main contact person:
Mary Lacy
mlac@loc.gov
lcead@loc.gov
RLG Archival Resources participant?:
Yes
Louisiana State University Libraries
Delivery method:
Native SGML
Encoding procedure:
LSU Special Collections begins with WordPerfect 7.0 files (inventories),
outputs the file to ASCII format, and manually tags the file with EAD
tags. We initially used SoftQuad's Author/Editor, but found it to be very
technical and hard to explain or troubleshoot. We are waiting for the
next generation of improved SGML editors before we invest more money in
authoring software.
In the initial implementation, we have provided only the preliminary
information, such as scope and content notes, biographical/historical
notes, etc. In the second phase of our implementation, we will add the
container list and location information for each inventory. Currently,
approximately fifty percent of our one hundred electronic inventories have
been tagged as SGML files.
Main contact person:
Pati Threatt, Head, Special Collections Processing Dept.
mailto:pthreat@lsu.edu
RLG Archival Resources participant?:
No, not at the present time.
Delivery method:
MOAC museum partners prepare EAD encoded collection guides using EAD capable tools developed by MOAC. Partners then contribute finding aids to the central database system the California Digital Library chooses to implement for the larger OAC union database. These are served from the CDL's OAC server using the general OAC interface. However, the MOAC content is also served out on the MOAC website in a portal fashion, where searches into the central OAC database are limited to MOAC collection guides, and the results are transformed into the MOAC template using XSLT on the fly. In all cases we are using a centralized-decentralized model, where the text/SGML finding aids are stored centrally for search and display, but the images are stored locally on the home institutions's webserver and linked for display from within the central finding aid via the href attribute
Encoding procedure:
Since museum-type records are often detailed at the item level and are often exported in a structured form from museum collection management systems for inclusion in the EAD finding aid, automation is feasible and desirable. We mark up the EAD header and scopecontent/bioghist info manually using A/E. But for the container lists, we export records from the collection managment system and either use the database to mark up the records as they are exported, OR export the records as tab-delimited text and write Word Macros to automatically mark up the records. Container lists with complex hierarchical relationships then take some extra hand manipulation, but most of the markup is still done automatically.
Main contact person:
Richard Rinehart, Director of Digital Media
rinehart@uclink.berkeley.edu
RLG Archival Resources participant?:
Not yet
Mount Vernon Library and Curatorial Collections
Delivery method:
Direct link to HTML files
Encoding procedure:
The first step in preparing the instances is to prepare the information for the collection-level finding aid. This is done by making a skeletal version of the finding aid with the headings, "agency history," "index terms," and a few others, and printing it out. Sometimes, notes are made directly on the print-out, filling in the blanks. Sometimes notes are made long-hand on cards and note paper. These are then entered into an XMetal instance with the help of notes and handouts from the SAA workshop, the Application Guidelines and Tag Library, and marked-up examples such as the William Fonds Provenance.
The only thing used to proof and parse the instances is XMetal, on a PC.
The indexing procedure is to make notes about possible terms, look them up in the Library of Congress's online catalog, using LC's heading when possible, and when not, constructing the heading based on AACR2.
Main contact person:
Lisa Odum, Associate Librarian
lodum@MountVernon.Org
RLG Archival Resources participant?:
Not yet
National Library of Medicine, History of Medicine Division
Delivery method:
We will serve HTML documents from a list of manuscript collections on our web site (from a MS Access dbase), and links from MARC records. We are also exploring various search engine applications.
Encoding procedure:
I use Notetab with Daniel Pitti's markup scripts, modified for local use. We use James Clark's parsers and XT for HTML conversion (provided by Pitti at Rare Books School). I use a modified XSL stylesheet taken from the EAD Cookbook (eadcbs2) for creating the HTML. Conversion of legacy data is being performed by Electronic Scriptorium (Leesburg, Va.) using our Notetab system. We haven't explored converting the print finding aids, yet.
New finding aids will probably be created using a hybrid Notetab/XMetal process, and Alvin Pollock's MS Access report process for doing container lists.
Main contact person:
John Rees
Assistant Curator, Modern Manuscripts
History of Medicine Division
National Library of Medicine
ReesJ@mail.nlm.nih.gov
RLG Archival Resources participant?:
Yes.
Delivery method:
XML versions of finding aids are converted to HTML that may be accessed in three ways:
The transformation from EAD in XML syntax into HTML is done through the use of an XSL stylesheet and James Clark's XT program, a free XSL processor application. This operation is done in batch mode. The original source XML and HTML versions then are mounted in a single directory on our Web site. The same process, employing a second stylesheet, is used to generate a print copy of the finding aid for our reading room.
Finding aids are also accessible via RLG's Archival Resources service which is available to researchers in our reading room.
Encoding procedure:
The Minnesota Historical is currently revising its authoring process. For newly processed collections, separate techniques are used to create the collection level portion of the finding aid and the description of the components section.
The collection-level data is harvested from the MARC record which is typically prepared before the rest of the finding aid. The MARC record is downloaded from the OPAC. Logos Research's marcxml.exe program then converts that file from the MARC transmission format into the MARC DTD format. The resulting XML file is then converted into an EAD instance using an XSL stylesheet and James Clark's xt.exe program. A single batch process executes all these steps. The resulting XML file is then loaded into the XMetaL application wherein the cataloger simply completes the <eadheader> data and/or augments the description with expanded scope and content, biographical, or organization information depending on the complexity of the materials.
The description of the components section is created in either of two ways. For small collections with short contents listings, the cataloger adds the <dsc> portion directly to the file in XMetaL using a series of keyboard macros. For longer contents listings, we find it more flexible to create the text in Microsoft Word, convert that document into XML using the Microsoft SGML Author for Word software described below, and append it to the XML file.
Using SGML Author for Word to encode the text involves the following tools:
Using SGML Author for Word to encode the text involves the following steps:
Retrospective conversion is currently being performed through APEX Data Services who provide SGML versions of existing files that they rekey into EAD. These documents are parsed and converted into XML using the SX program and edited as necessary in XmetaL.
Main contact person:
Michael Fox
Head of Processing
michael.fox@mnhs.org
RLG Archival Resources participant?:
Yes
Massachusetts Historical Society
Last updated: January 2005
Delivery method:
XML is converted to HTML on-the-fly using Coldfusion and a locally-developed
XSLT stylesheet. EAD encoded finding aids are accessible from MARC records
in the local OPAC and OCLC; via a search interface which using Coldfusion
and the Verity search engine; and by a browse list that is generated
dynamically using Coldfusion.
Encoding procedure:
Finding aids begin as either MS Word files or papers copies. In cases where
OCR is necessary, we use OmniPage. Finding aids are marked up in EAD2002
using XMetaL 4 (Author) and locally developed templates. Macros are used to
ensure regular mark up and provide consistent boilerplate text. A local XSL
stylesheet is used to test each EAD instance for conformity with the MHS
template and best practice. The Verity index must be repopulated after new
files are added or changes are made to previously posted files.
RLG Archival Resources participant?:
No
Delivery method:
Our EAD documents are delivered as HTML documents which have been
converted from XML using an XSL stylesheet and James Clark's XT. This
conversion is not done on the fly, but is done as part of the production
process.
Encoding procedure:
At Old Dominion University finding aids are found in a variety of
formats: print, word-processor files, and HTML files. The finding aids
in electronic format were largely derived from the print finding aids
via scanning and OCR software--and finding aids for new collections are
being created as Word files. The finding aids that we have encoded to
date have all been available previously in electronic format. We began
creating EAD 1.0/SGML files using MS Word and a template adapted from
Duke University (http://scriptorium.lib.duke.edu/findaid/ead/). Later,
we replaced this process with a Web-based form to allow for consistent
creation of the EAD files. We also have begun using Emacs/PSGML to
encode those finding aids which differ from our template, and to edit
already created SGML documents--and are exploring the use of SoftQuad's
XMetal.
After the SGML document is created, it is validated with James Clark's NSGMLS, and converted to XML using James Clark's SX (both programs are part of the SP package available from http://www.jclark.com/sp/). The XML document is then indexed using Perl's XML::Parser module (http://wwwx.netheaven.com/~coopercc/xmlparser/intro.html) into a freely available relational database called MySQL (http://www.mysql.com). Once created the indexes can be searched from a web-based interface which uses Perl to manipulate the MySQL database (http://libstaff.lib.odu.edu/sgml/bin/ead/ead-search.cgi). These processes (creation of SGML, SGML->XML, XML->MySQL, XML->Interpreted MARC record) are conducted via a web interface (http://libstaff.lib.odu.edu/sgml/bin/ead/ead-toc.cgi)
Further information can be found at: http://libstaff.lib.odu.edu/projects/ead/
Main contact person:
Ed Summers
Electronic Resources Cataloger.
Bibliographic Services
Old Dominion University
Norfolk, Virginia 23529-0256
(757) 683-4340
esummers@odu.edu
RLG Archival Resources participant?:
No
Delivery method:
We use an XSLT style sheet to convert the XML to HTML for display purposes.
We are also using ASP to allow for viewing part (scope/content note, series
description, subseries description, etc.) of the finding aid at a time. We
are providing this option in order to make the files, which are quite large,
more manageable for those without a T1 or other fast Internet connection.
Currently, finding aids are not indexed; they can be browsed from an
alphabetical list available on the "Collections" webpage. We also plan to
provide hyperlinks from collection-level MARC records in our Library's OPAC.
Encoding procedure:
Our process is completely automated. Collections are described fully in an
MS Access database. Then, we use VB scripts to create valid XML files
compliant with the EAD DTD version 2002.
Main contact person:
Katherine Stefko
Mellon Archives Project Manager
Philadelphia Museum of Art
Box 7646
Philadelphia, PA 19101-7646
215-684-7642
kstefko@philamuseum.org
RLG Archival Resources participant?:
No response
Public Record Office (United Kingdom)
The Public Record Office, UK, is using EAD in a number of related projects, as part of our Archives Direct 2001 (or AD2001) programme, which is designed to help us achieve our goal of having the Public Record Office Catalogue (or PROCAT) online by 2001.
Delivery method and Encoding procedures:
1. Core Executive Project
Delivery
We are delivering our EAD files in two ways in order to test user preference. We are serving native SGML files for use with Panorama Viewer and we are using DynaText/DynaWeb as an on-the-fly HTML conversion, which has been by far the most favoured option. The only indexing available with either site is that provided for 'out of the box' by the particular software.
Preparation
We have amalgamated data from two sources. Firstly, data from the PRO Guide (an Oracle 6.0 database viewed by readers as hardcopy) was cut-and-pasted across into the appropriate EAD element in an Author/Editor template to create the 9 departmental files. Secondly, data from our set of hardcopy container lists was marked up and sent to an outside contractor to convert to a restricted EAD template. The returned SGML files were then pasted into the Author/Editor template.
Having got completed SGML files in this way we used some automated procedures written in Perl to ensure file integrity and editorial consistency. Author/Editor's parser was used during composition, but it cannot cope with files larger than approx. 1.1Mb. James Clark's NSGMLS was subsequently used to validate all files. A/E's file size problem indicated to us that such files might also produce prohibitively long download times, so we split them into smaller, hyperlinked instances.
Access terms have been input in the <controlaccess> elements as a basis for future authority controlled indexing but we have not yet developed the functionality to make use of them.
Software
Interleaf's Author/Editor has worked well as an editing environment, the only real problem being that of parsing large files noted above. After somewhat of a steep learning curve we have also found Panorama's style sheets and navigators to be flexible in presenting the material as we have wished.
2. TOPCAT
We then used DBI, a free Perl-based Database API written by Tim Bunce (http://www.hermetica.com/cgi/dbi/moduledump?module=DBI) to read and fetch the data out of the Oracle tables and encode it into a new SGML database of nearly 300 EAD instances (one for each government department or agency). The generation program included Perl sub-routines to identify and automatically insert internal and external cross-references.
At present TOPCAT is available on the PRO's internal network, but we are looking into the possibility of WWW access in 1999.
3. PROCAT
Main contact person:
ead@pro.gov.uk
RLG Archival Resources participant?:
We are participating in RLG's Archival Resources project with data from
our Core Executiveproject.
Rutgers University / Center for Electronic Texts in the Humanities (CETH)
Encoding procedure & Delivery method:
Our finding aids are marked up in 1.0 (currently still
using Author-Editor 3.5 but have recently purchased X-Metal) and then
loaded into a Dynaweb server.
We have been also provided links using the 856 field in our AMC records
for our local SIRSI system and have been reporting completed finding
aids to RLG for inclusion in Archival Resources.
Main contact person:
Thomas J. Frusciano, University Archivist
Special Collections and University Archives
Rutgers University Libraries
fruscian@rci.rutgers.edu
RLG Archival Resources participant?:
Yes.
Santa Clara University Archives
[Also incorporated into the Online Archive of California]
Delivery method:
Using Internet Archivist
Encoding procedure:
What format you began with:
Starting with legacy printed finding aids, and also preparing new finding aids.
How you converted to electronic format (if you began with printed documents):
Re-entering data using Internet Archivist.
How you mark up your instances:
Internet Archivist automatically generates HTML documents (and SGML and XML)
If you've automated your markup in any way:
Yes, using Internet Archivist. I don't have to know how to do mark-up. But I have not
automated markup of legacy finding aids that have not been re-entered into s/w application.
Extra steps taken to proof and parse instances:
None yet
Indexing procedure:
None yet
Main contact person:
Anne McMahon, University Archivist
amcmahon@scu.edu
RLG Archival Resources participant?:
No.
Stanford University Library, Department of Special Collections
[Also incorporated into the Online Archive of California]
Delivery method:
Native SGML
Encoding procedure:
Guides without electronic copy typically rekeyed or, if
original is clean enough, through scanning and OCR.
However, scanning typically proves to be unfeasible.
Inventories are generally created in MS Word or in FileMaker Pro databases. Information stored in database form is exported as tab-separated text and then marked up using MS Word macros written in house.
Parsing is done using toolkit created by Alvin Pollock of the University of California at Berkeley, which includes various parsers that are EAD version 1 savvy. Errors are then fixed using DeskEdit.
Guides are listed by main topics from the Dept. of Special Collections home page and are indexed via the RLG Archival Resources program and through U.C. Berkeley's Sunsite2 server as part of the American Heritage Virtual Digital Archives Project.
Main contact person:
Steven Mandeville-Gamble
RLG Archival Resources participant?:
Yes
South Texas Archives, Texas A&M University
Delivery method:
The finding aids are delivered on the web in SGML or HTML as generated
by the software, Internet Archivist - EAD by Interface Electronics Inc.
Encoding procedure:
What format you began with:
With previously created finding aids we start at one of two places.
Recently created finding aids, one to five years old, have the finding
aid stored on computer in a .txt format that can be accessed by the
"editor" component of the Internet Archivist - EAD software and can be
"cut and pasted" into the appropriate input fields of the software.
Older finding aids exist as paper finding aids only and everything must
be typed into the correct software input fields.
How you converted to electronic format (if you began with printed documents):
Older finding aids exist as paper finding aids only and everything must
be typed into the correct software input fields.
How you mark up your instances:
The instances are marked up as the correct information is input into the
correct fields of the Internet Archivist - EAD software.
If you've automated your markup in any way:
Yes, the software allows and we use templates for most of the
information that is generic to each of our finding aids.
Extra steps taken to proof and parse instances:
Currently only manual observation and checking of the generated results
in SGML and HTML exists.
Indexing procedure:
No indexing necessary, there is a rudimentary search engine. There is
the option of adding search terms to the EAD document that will become
terms searched by the search engine.
Main contact person:
Cecilia Hunter
South Texas Archives
James C. Jernigan Library
Texas A&M University - Kingsville
Kingsville, TX 78363
(361) 593-2776
kacah00@tamuk.edu
RLG Archival Resources participant?:
No
Delivery method:
Our marked up documents are accessed through a search and retrieval
system created in-house and based on I-site freeware. This system, we
call SUUper Search, indexes and parses the marked up documents and also
provides for web access. Searches can be conducted through the web and
the hit list is translated to html on the fly for viewing. Many of the
items in our archive are instantly viewable using hot links embedded in
the item description.
Encoding procedure:
We entered the mark up world with little or no need for retrospective
conversion so our documents are "born digital." We mark up our
documents using the WordPerfect SGML editor and have been very pleased
with it. We have automated many of the more complex nested tag
arrangements required at the item level by creating Word Perfect macros.
This has worked so well that we have student employees that can mark up
large finding aids after only a short supervised training period.
Main contact persons:
Matt Nickerson
nickerson@suu.edu
or
Janet Seegmiller
seegmiller@suu.edu
Southern Utah University
351 W. Center St.
Cedar City, UT 84720
435-586-7947
RLG Archival Resources participant?:
Yes
United Methodist Church Archives - GCAH
Last updated: January 2005
Delivery method & Encoding procedure:
We currently have around 200 finding aids on our site. Most of the
information is entered into databases while the archival material is being
processed. When we are ready to create a new finding aid we export the
selected material out of our databases and assemble the final product using
NoteTab. NoteTab is used to make final adjustments to the document, to
parse it and to transform it.
Using XSLT we create a PDF file from which we print a copy of the finding aid for our Reading Room and create an HTML copy for the web site. Both the XML and HTML versions go on the web site.
The XML instances are indexed using Cheshire II. The patron searches the Cheshire II indices. The results of the search are run through a script which links the results to the appropriate html copy. We found this to be faster than trying to have the XML document convert on the fly.
Main contact person:
L. Dale Patterson
Archivist
dpatterson@gcah.org
RLG Archival Resources participant?:
No
University of California, Berkeley
[Also incorporated into the Online Archive of California]
LAST UPDATED: NOVEMBER 2001
Delivery method:
We are currently serving out our finding aids via The California Digital
Library's Online Archive of California. The OAC uses the DynaWeb server
software from the Enigma Corporation.
Encoding procedure:
Since 1995 the UC Library has employed a wide variety of techniques to encode our legacy finding aids into SGML. This reflects the wide variety of formats these documents were in. As we began our retrospective conversion with The Bancroft Library's electronic finding aids--authored originally in WordPerfect--we began by employing WordPerfect macros of varying sophistication. The lead programmer provided intensive training in the WordPerfect macro language in the form of a series of seminars. The original WordPerfect macro manual used within the unit (which is now somewhat out of date) can be found here.
Since the beginning of the project we have utilized the technique of stepwise refinement to encode legacy finding aids. A practice we have continued to this day. Stepwise refinement involves beginning the encoding process by adding "coarse" markup, essentially fitting the legacy information into a broad hierarchical structure consisting of little more than component information. The a variety of techniques are employed to add more markup of an increasingly finer granularity, e.g., next adding the unittitle information, then encoding unitdates, etc. Most of these subsequent passes were performed also using WordPerfect macros, but as the project progressed the perl programming language was employed.
Today, every member of the Digital Publishing Group has completed 5 week classes in perl programming through the University Extension program and perl has become part of our markup lives. We have created a small toolkit of simple perl programs which is available at: http://sunsite2..edu/oac/toolkit. The kit is composed of several small scripts useful for stepwise refinement including scripts to recognize and encode unitdates, persnames, and corpnames within unittitles. The toolkit also includes a preconfigured parser (nsgmls) used to validate each and every finding aid before it is submitted for publication on the OAC.
Before long we found that we could more efficiently encode a finding aid's "front matter"--that is, all of the information not occurring within the dsc--through a standard web template. This proved faster than trying to create macros or specialized programs to accomodate the wide variety of layouts in the finding aids produced by the eight contributing repositories at UC . The templates can be seen in action at: http://sunsite..EDU/FindingAids/uc-ead/templates and the cgi script we use is available for anybody else to use part of the toolkit.
Curiously, we have found that using commercial SGML editors such as AdeptEdit, Author/Editor, or XMetaL, was not an efficient way to convert legacy information into EAD. Although each member of the Digital Publishing Group has copies of XMetaL installed, we find it useful solely as a reference tool, particularly while bringing new encoders up to speed in EAD. It is far faster to programmatically convert text to EAD in broad strokes than to apply the copy and paste method required when using these editors. XMetaL may have a role in the authoring of new finding aids, but much customization--mainly in the form of targetted dialog boxes and refinement macros--needs to be done before finding aid authors can consider it a viable replacement for their trusted word processing program.
After we completed conversion of all of our word processing files for legacy information held by Berkeley and by many of the affiliates of the Online Archive of California, a process funded by a variety of grants, we turned our attention to all of the legacy finding aids available only on paper. These we contracted out to a conversion vendor, Apex Data Services, which keyed the data and generated EAD. This EAD was then further refined in house when the data was returned. Our experience with employing an outside vendor for the process was fairly good, far better than our earlier experience using scanning and OCR in-house. Most finding aids required very little editing and correction but a small few of the more complex variety required great deals of time to bring up to local standards.
We are investigating a variety of options for incorporating EAD directly into the authoring process, including a complete suite of MS Word templates and macros, dubbed "EAD Stylus", and available as part of the toolkit. Another option is to more fully integrate EAD into the Generic Digital Projects Database, developed initially for UC 's role in the Making of America II project. The Generic Database was designed to accomodate the workflow and data entry for 's variety of digitization projects including images, electronic text, sound files, moving pictures, etc. As it was intended to accomodate hierarchical description and produce arbitrarily generic output, it was easily adapted towards EAD.
Relational databases have taken on a larger role at in recent years. We now can easily import EAD-encoded finding aids into any arbitrary relational database--for enriching the data, adding item-level information for digitized surrogates, collection management, etc.--and exporting back out to EAD or serving out on the web. A tutorial and several sample programs written in perl are available at: http://sunsite..edu/ead/eaddb.
Now that conversion of our legacy finding aids is complete we are involved more and more in digitizing surrogates of the archival materials themselves: selected photographs, books, diaries, letters, both represented by images or sequences of images, and as searchable electronic text encoded in TEI. We are committed to using the emerging METS standard for encapsulating single and multipart digital objects in XML "wrappers." More information on these efforts is available on our Making of America II website.
Since the earliest days of the project, has realized the importance of developing and adhering to consortial standards. The EAD encoding standard allows a surprisingly divergent, and often distressing, variety of encoding methodologies. In 1996 four institutions, UC , Stanford University, Duke, and the University of Virginia, met to develop a uniform encoding standard for EAD finding aids. This standard, the American Heritage Retrospective Conversion Guidelines, was adopted and later developed upon and refined by the UC EAD consortial project which later grew into the Online Archive of California. Recently, the Online Archive of California has developed a standard for the encoding of new finding aids, the Best Practices Guidelines for the Encoding of New Finding Aids, which builds upon those guidelines layed out in the Retrospective Conversion Guidelines. Although intended for new finding aids, the BPG provides guidelines which are beneficial to all finding aids. Although we foresee difficulties applying the full BPG to our "legacy" EAD documents we are involved in a process of upgrading them to a subset of BPG programmatically. This involves, most importantly, stripping out the old style <drow>/<dentry> tabular markup employed in the early days of EAD at , and combining the separate Series Description and Container List into a single <dsc> of type "combined".
Finally, UC has no plans at the present time to begin encoding finding aids in XML. First, all of our current tools handle both XML and SGML so there is no reason for us to switch. Secondly, the XML standard lacks the robust entity management mechanism present in the SGML standard. We have found this entity management to be crucial, especially when interchanging finding aids with other institutions and consortia (hard-coding a specific path or URL in every entity declaration is onerous). If new tools become available for either authoring or publishing, which require XML and which we would find valuable, or if stronger entity managment is included in a future version of the XML standard, we would like to switch over.
All of our raw SGML files may be accessed in the SGML section of the Online Archive of California: http://www.oac.cdlib.org/sgml
Main contact persons:
Lynne Grigsby-Standfill, Head
Digital Publishing Group
UC Library
lgrigsby@library..edu
Alvin Pollock, Lead Programmer
Digital Publishing Group
apollock@library..edu
RLG Archival Resources participant?:
Yes.
University of California, San Diego, Mandeville Special Collections Library
[Also incorporated into the Online Archive of California]
Delivery method:
The finding aids database mounted by the Mandeville Special Collections
Library at the University of California, San Diego serves finding aids
in three modes: EAD encoded, HTML encoded, and ASCII. These are
parallel versions stored on a single server. MSCL does not create any
version of its finding aids on the fly.
All our finding aids were in digital form (WordPerfect 5.1 files) in the summer of 1993 and were mounted on the internet as ASCII files in the MSCL gopher. When we elected to construct a relational database application, the finding aids had to be converted into database records. Generally, this required cutting and pasting of large preliminary sections such as abstracts and biographical notes and rekeying of inventory information.
Encoding procedure:
In the Mandeville Special Collections Library, description of
manuscripts and archival records is first entered into an in-house
database application created with Microsoft's FoxPro 2.6 for Windows.
Once the quality and integrity of the description is assured, the
description is output, using pertinent database report forms, in the
form of a printed finding aid or a digital finding aid in EAD, HTML, and
ASCII modes. The EAD output is validated using a parser designed by
Alvin Pollock of the Text Encoding Unit at UC, . It is then
mounted onto a UCSD Libraries server as well as a California Digital
Libraries server. Once mounted on the UCSD Libraries server, the
database of EAD finding aids is indexed using the Verity Query
Software. The parameters for this indexing are established by systems
staff in accordance with the wishes of MSCL staff, but the process
itself is done as part of the output process.
Construction and use of the MSCL finding aids database relies on five software packages:
A) A processing database application created from Microsoft's FoxPro 2.6 for Windows. This application allows us to normalize finding aid structure and to output simultaneously three modes of digital finding aids. It allows all of the encoding (EAD & HTML) and text formatting (paper and ASCII) to be predefined and executed automatically. That means the generation of particular instances can be done more quickly and, if desirable, at a lower level staff.
B) EAD parser designed as part of the UC-EAD project. Every newly generated EAD instance is validated with the parser before being mounted on the server.
C) Verity Query Language used to index the EAD finding aids database and allow for cross collection searching.
D) Softquad's Panorama (or some parallel) that allows the client to access the native ead files.
E) An HTML browser which allows the client to access the HTML files.
Main contact person:
Bradley D. Westbrook, Manuscripts Librarian / University Archivist
bdwestbrook@ucsd.edu
(619-534-6766; FAX 619-534-5950)
RLG Archival Resources participant?:
[no answer]
University of Chicago Library, Department of Special Collections
Delivery method:
At this time the finding aids are viewable in our reading room using Panorama Viewer.
We are investigating other delivery options.
Encoding procedure:
After a review of commercial sgml encoding products the University of Chicago Library decided to create an encoding program in-house. We decided that the best way to encode a finding aid was to separate it into sections. A finding aid can be conceptually split into two distinct parts, the front matter, which is information about the finding aid itself, and the container listings, which describe the actual contents of the collection. In order to automatically markup finding aids using our program, the finding aid must first be in electronic form (Word file). This can be achieved by either scanning or re-keying the document. The front matter is marked up using an HTML form (template). The relevant information is cut-and-pasted into the template fields and then run through a cgi-script written in Python 1.5 which outputs marked-up text to a file. This part of the program takes cgi variables and marks them up, formatting them slightly and producing the first part of the finding aid. The second half of the finding aid (container listings) is marked up using a program designed especially for this project, which searches through a text file for patterns and keywords, and outputs marked-up text to a file. When these two files are joined together, we have a completely encoded finding aid, viewable with an SGML browser.
The program examines the text line by line, looking for an indicator we have inserted () to mark the beginning of relevant material. When it finds the indicator, it then starts scanning for keywords such as Folder, Box, Series, etc. Each keyword prompts an action, or subroutine. As the program can only really look for patterns, anomalies can cause problems. The program was written to do the bulk of the work, but hand editing before and after running the program may be necessary.
Main contact person:
Eileen A. Ielmini
Processing Archivist
Phone: (773) 834-2647
Email: eielmini@midway.uchicago.edu
RLG Archival Resources participant?:
Yes, the University of Chicago Library, Department of Special Collections is participating in RLG's Archival Resources project. At least 9 EAD encoded finding aids can be found at this site.
University of Glasgow, Glasgow University Archives & Business Records Centre
Delivery method:
These descriptions are presently made available in several different HTML renditions, which are generated offline via XSL(T) stylesheets (using James Clark's XT processor) from an XML version of the EAD document. An XML version is also made available, optionally with one of several (Microsoft-dialect) XSL(T) stylesheets for transformation and rendering on the client side using Microsoft Internet Explorer 5. At present, SGML versions are not made available via the Web.
Encoding procedure:
Descriptions are created as word processor documents in Microsoft Word, using a template which structures the description according to the elements of ISAD(G). The document creator uses dialogues provided by Word macros to provide additional structuring of the text at the sub-paragraph level. (Where non-digital descriptions existed, they have been re-keyed).
The SGML markup is provided by a Word macro which processes the (highly structured) Word document produced by using the template described in the previous paragraph.
The Word conversion macro maps the ISAD(G) elements to the appropriate EAD element types, generates standard EAD header information, and outputs a complete valid EAD-encoded document. (N.B. the macro is *not* intended as a generic Word-to-SGML tool: it is designed specifically to process the structured document which is produced from using the template in a controlled manner.)
Some limited checking of structure and content is performed by Word macros either at the time of document creation or at the point of conversion to SGML. That SGML document is then processed and validated by James Clark's NSGMLS parser. The SGML document is converted to an XML version using James Clark's SX, and that provides the input for the XSL(T) processing described above to generate HTML renditions.
The document creator uses a thesaurus lookup procedure for all access point terms to ensure that occurrences of such terms (a) have a standard form and content, and (b) are associated with a unique identifier for the entity which acts as a pointer to an archival authority record for that entity. At the time of writing (October 1999), however, we do not have a search and retrieval tool which can fully exploit that markup. Some very basic static index pages for the descriptions are currently generated through the use of XSL(T) stylesheets.
All the HTML renditions of our descriptions contain (as HTML meta elements) basic Dublin Core metadata derived from the content of the EAD header and archdesc-level controlaccess elements. We are also experimenting with the generation of RDF-based metadata for the finding aids. At present, this employs the semantics of Dublin Core, but we envisage that it could be extended to incorporate other metadata schemas as required.
Main contact person:
ead@archives.gla.ac.uk
RLG Archival Resources participant?:
No
University of Liverpool Special Collections
Delivery method:
Delivered via the Archives Hub in the United Kingdom.
We have written an SGML to HTML conversion program which is sufficiently generalized to work with any valid EAD document instance (with a bit of luck; *sometimes it does break*). You can preview this on: http://gondolin.hist.liv.ac.uk/~azaroth/ead
(We'd welcome the chance to try the conversion program (in perl) with other EAD documents. Shortly, we'll be releasing the perlscript for this on the Arts and Humanities Data Service website in the UK for all to use. Ditto for Z39.50 client/server when it's completed later this year. The gondolin site, by the way, is experimental; we're still writing and rewriting the code. Also looking for a US host for the perlscript.)
Encoding procedure:
What format you began with:
A variety of formats. Most are encoded originally; some are conversions
from paper lists. We try to go down to item or piece level for every
archive finding aid. Our largest finding aid is 30mb.
How you converted to electronic format (if you began with printed documents)
For the most part we revised and updated the lists as we went along,
which involved retyping. Some lists were done offshore using PCL which
we found to be a very good company to deal with (very competitive rates
and no evident mistakes with a one week turnaround time for (for
example) 500 page documents).
How you mark up your documents
Using an SGML editor (AdeptEdit). Sometimes we've used ProCite (a
proprietary bibliographic database) which we output to EAD. Occasionally
we use emacs or xemacs.
If you've automated your markup in any way
Not really. We do it the hard way.
Extra steps you take to proof and parse your documents
We double check everything using James Clark's NSGMLS software
(available on http://www.jclark.com), which we found much stricter than Adept.
Indexing procedure
The conversion program automatically indexes everything. We just type in
CONVERT filname.sgml and it does the rest. One of the main problems
with EAD has to do with difficulties in indexing; since no tags are
required, what do you index? We've written perlscript which goes down
the tree starting from <unitid> looking for a consistent tag to index.
If nothing works, the program slaps in proprietary numbers.
Software you use for markup and delivery
The only commercially available software that is any good seems to be
AdeptEdit by ArborText. Emacs and xemacs (in SGML mode) are extremely
good (probably better than AdeptEdit), but for large finding aids our
staff prefers the document map feature of AdeptEdit. Emacs and xemacs is
free of charge and works on any platform which also is a help.
I've set up a web-page for the project, which includes a lot of resources created for the project as well as documentation. The URL is: http://sca.lib.liv.ac.uk/archiveshub/contents.html.
Main contact person:
Paul Watry
pwatry@liverpool.ac.uk
RLG Archival Resources participant?:
Yes
University of Michigan, Bentley Historical Library
Delivery method:
Finding aids are delivered using on the fly conversion to HTML. At
present there is no provision for delivery of native SGML. The Bentley is
working in collaboration with the University Librarys Digital Library
Production Services Unit. The SGML finding aids are stored on a DLPS
server using Open Text PAT 5.0 for indexing and searching. The search
interface is a simple HTML forms page. DLPS-developed CGI and perl
scripts translate queries from HTML to Open Text search language and
translate results from SGML to HTML for delivery to user. (see About the
Bentley EAD Project on our EAD site for more details)
Encoding procedure:
For about ten years Bentley finding aids have been created in Microsoft
Word (currently WORD 6.0) using a WORD stylesheet to control formatting.
Some older finding aids have been OCRed and then formatted and updated
using the WORD stylesheet. The stylesheets identify most elements of the
finding aid in a way that can be used for automated conversion to EAD.
1) Finding aids are created/edited in WORD to conform to the latest version of the stylesheet.
2) Several WORD macros employing the stylesheet codes are run on the container list portion of the finding aid to convert it to an EAD document. Macros insert proper <C0x><did><unittitle><unitdate><physdesc>and<note> tags. Other macros convert MARC controlled access terms, various lists and indexes to appropriate EAD tags.
Main contact person:
Greg Kinney
gkinney@umich.edu
phone 734-764-3482
RLG Archival Resources participant?:
Bentley finding aids have been made available to the RLG project
University of Minnesota, Special Collections and Archives
We are engaged on a one-year project (2004-05) to implement EAD in nine of the Special Collections and Archives units within the University:
A few of the units had started implementing EAD before this time, but during this current effort most staff will learn EAD and will use it to describe all new collections. The project will also provide staff who will encode numerous legacy finding aids from all units.
Current finding aids exist in a variety of software programs (notably Word and Access); some are only typewritten. Using a template developed by project planning staff, we will use XMetaL to encode all finding aids, old and new. A number of basic finding aids are being created from MARC records by the Libraries' Technical Services staff using a macro developed in-house.
During our initial stage, delivery is in either XML or HTML over the Web. Part of our project involves the selection and implementation of a delivery package for display, search, and retrieval. Our project web page will provide ongoing progress reports. http://wiki.lib.umn.edu/Staff/FindingAidsInEAD
Main contact person:
Leslie Czechowski, EAD project Archivist
czech008@tc.umn.edu
RLG Archival Resources participant? YES
University of North Carolina at Chapel Hill, Manuscripts Department
Last updated: January 2005Delivery method:
The Manuscripts Department (http://www.lib.unc.edu/mss/) includes the Southern Historical Collection (SHC), the Southern Folklife Collection (SFC), and University Archives (UA). For the SHC and the SFC, users get to an HTML file that allows them to chose an XML version or an HTML version of a given finding aid. For the UA, only the HTML versions are available.
All but a few Manuscripts Department collections are represented online by MARC records in the UNC-Chapel online catalog. All but a few collections have some sort of representation on our website, some by EAD-encoded finding aids and some by finding aids in other formats (chiefly ascii files) that vary widely in depth and detail.
Encoding procedure:
We have used NoteTab for several years to mark up all new and all modified finding aids in EAD. We're doing some legacy work on word-processed finding aids that are not marked-up in EAD and on paper finding aids, but, like most everyone else, only when time and funding permit. So we have MANY paper-only finding aids, but no ongoing project aimed at making EAD-encoded versions of these documents. These legacy finding aids are keyed in as with EAD markup when the collections they represent are reprocessed under special projects or because of additions or other changes that warrant finding aid revision.
Processors produce EAD marked-up finding aids in NoteTab using templates that we developed in cooperation with NC EAD, a subgroup of NC ECHO (North Carolina ECHO, Exploring Cultural Heritage Online, the state’s doorway to the special collections of North Carolina's libraries, archives, museums, historic sites, and other cultural institutions). Completed finding aids are reviewed by the processing supervisor (most of our finding aids are written by graduate students).
The departmental cataloger does the final editing, adding the controlled access terms; creates the HTML version; mounts the versions on the web; and does the MARC cataloging, which includes an 856 linking field to the finding aid. Through the abstract and controlled access fields, we include all information from the MARC record in the EAD-encoded finding aid. We do name and subject markup in the EAD-encoded finding aid (content tags) within the scopecontent at the collection level only (we call it a Collection Overview).
Main contact person:
Lynn Holdzkom
uholro@email.unc.edu
RLG Archiva