<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE entry
  PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<entry>
    <url>http://sunsite.berkeley.edu/ead</url>
    <institution>University of California, Berkeley [Also incorporated into the <a href="http://www.oac.cdlib.org/">Online Archive of California</a> ]</institution>
    <updated>November 2001</updated>
    <delivery>
      <p>We are currently serving out our finding aids via The California Digital Library's Online
        Archive of California. The OAC uses the DynaWeb server software from the Enigma
      Corporation.</p>
    </delivery>
    <encoding>
      <p>Since 1995 the UC Library has employed a wide variety of techniques to encode our legacy
        finding aids into SGML. This reflects the wide variety of formats these documents were in.
        As we began our retrospective conversion with The Bancroft Library's electronic finding
        aids--authored originally in WordPerfect--we began by employing WordPerfect macros of
        varying sophistication. The lead programmer provided intensive training in the WordPerfect
        macro language in the form of a series of seminars. The original WordPerfect macro manual
        used within the unit (which is now somewhat out of date) can be found <a href="http://sunsite..edu/ead/wpmacros">here.</a>
      </p>
      <p>Since the beginning of the project we have utilized the technique of stepwise refinement to
        encode legacy finding aids. A practice we have continued to this day. Stepwise refinement
        involves beginning the encoding process by adding "coarse" markup, essentially fitting the
        legacy information into a broad hierarchical structure consisting of little more than
        component information. The a variety of techniques are employed to add more markup of an
        increasingly finer granularity, e.g., next adding the unittitle information, then encoding
        unitdates, etc. Most of these subsequent passes were performed also using WordPerfect
        macros, but as the project progressed the perl programming language was employed.</p>
      <p>Today, every member of the Digital Publishing Group has completed 5 week classes in perl
        programming through the University Extension program and perl has become part of our markup
        lives. We have created a small toolkit of simple perl programs which is available at: <a href="http://sunsite2..edu/oac/toolkit">http://sunsite2..edu/oac/toolkit.</a> The kit is
        composed of several small scripts useful for stepwise refinement including scripts to
        recognize and encode unitdates, persnames, and corpnames within unittitles. The toolkit also
        includes a preconfigured parser (nsgmls) used to validate each and every finding aid before
        it is submitted for publication on the OAC.</p>
      <p>Before long we found that we could more efficiently encode a finding aid's "front
        matter"--that is, all of the information not occurring within the dsc--through a standard
        web template. This proved faster than trying to create macros or specialized programs to
        accomodate the wide variety of layouts in the finding aids produced by the eight
        contributing repositories at UC . The templates can be seen in action at: <a href="http://sunsite..edu/FindingAids/uc-ead/templates">http://sunsite..EDU/FindingAids/uc-ead/templates</a> and the cgi script we use is
        available for anybody else to use part of the <a href="http://sunsite2..edu/oac/toolkit">toolkit.</a>
      </p>
      <p>Curiously, we have found that using commercial SGML editors such as AdeptEdit,
        Author/Editor, or XMetaL, was not an efficient way to convert legacy information into EAD.
        Although each member of the Digital Publishing Group has copies of XMetaL installed, we find
        it useful solely as a reference tool, particularly while bringing new encoders up to speed
        in EAD. It is far faster to programmatically convert text to EAD in broad strokes than to
        apply the copy and paste method required when using these editors. XMetaL may have a role in
        the authoring of new finding aids, but much customization--mainly in the form of targetted
        dialog boxes and refinement macros--needs to be done before finding aid authors can consider
        it a viable replacement for their trusted word processing program.</p>
      <p>After we completed conversion of all of our word processing files for legacy information
        held by Berkeley and by many of the affiliates of the Online Archive of California, a
        process funded by a variety of grants, we turned our attention to all of the legacy finding
        aids available only on paper. These we contracted out to a conversion vendor, Apex Data
        Services, which keyed the data and generated EAD. This EAD was then further refined in house
        when the data was returned. Our experience with employing an outside vendor for the process
        was fairly good, far better than our earlier experience using scanning and OCR in-house.
        Most finding aids required very little editing and correction but a small few of the more
        complex variety required great deals of time to bring up to local standards.</p>
      <p>We are investigating a variety of options for incorporating EAD directly into the authoring
        process, including a complete suite of MS Word templates and macros, dubbed "EAD Stylus",
        and available as part of the toolkit. Another option is to more fully integrate EAD into the
          <a href="http://sunsite..edu/MOA2">Generic Digital Projects Database,</a> developed
        initially for UC 's role in the Making of America II project. The Generic Database was
        designed to accomodate the workflow and data entry for 's variety of digitization projects
        including images, electronic text, sound files, moving pictures, etc. As it was intended to
        accomodate hierarchical description and produce arbitrarily generic output, it was easily
        adapted towards EAD.</p>
      <p>Relational databases have taken on a larger role at in recent years. We now can easily
        import EAD-encoded finding aids into any arbitrary relational database--for enriching the
        data, adding item-level information for digitized surrogates, collection management,
        etc.--and exporting back out to EAD or serving out on the web. A tutorial and several sample
        programs written in perl are available at: <a href="http://sunsite..edu/ead/eaddb">http://sunsite..edu/ead/eaddb</a> .</p>
      <p>Now that conversion of our legacy finding aids is complete we are involved more and more in
        digitizing surrogates of the archival materials themselves: selected photographs, books,
        diaries, letters, both represented by images or sequences of images, and as searchable
        electronic text encoded in TEI. We are committed to using the emerging METS standard for
        encapsulating single and multipart digital objects in XML "wrappers." More information on
        these efforts is available on our <a href="http://sunsite..edu/MOA2">Making of America
        II</a> website.</p>
      <p>Since the earliest days of the project, has realized the importance of developing and
        adhering to consortial standards. The EAD encoding standard allows a surprisingly divergent,
        and often distressing, variety of encoding methodologies. In 1996 four institutions, UC ,
        Stanford University, Duke, and the University of Virginia, met to develop a uniform encoding
        standard for EAD finding aids. This standard, the <a href="http://sunsite..edu/amher/upguide.html">American Heritage Retrospective Conversion
          Guidelines</a> , was adopted and later developed upon and refined by the <a href="http://sunsite..edu/FindingAids/uc-ead">UC EAD</a>
        <a>consortial project which later grew into the</a>
        <a href="http://www/oac/cdlib/org">Online Archive of California</a> . Recently, the Online
        Archive of California has developed a standard for the encoding of new finding aids, the
        Best Practices Guidelines for the Encoding of New Finding Aids, which builds upon those
        guidelines layed out in the Retrospective Conversion Guidelines. Although intended for new
        finding aids, the BPG provides guidelines which are beneficial to all finding aids. Although
        we foresee difficulties applying the full BPG to our "legacy" EAD documents we are involved
        in a process of upgrading them to a subset of BPG programmatically. This involves, most
        importantly, stripping out the old style &lt;drow&gt;/&lt;dentry&gt; tabular
        markup employed in the early days of EAD at , and combining the separate Series Description
        and Container List into a single &lt;dsc&gt; of type "combined".</p>
      <p>Finally, UC has no plans at the present time to begin encoding finding aids in XML. First,
        all of our current tools handle both XML and SGML so there is no reason for us to switch.
        Secondly, the XML standard lacks the robust entity management mechanism present in the SGML
        standard. We have found this entity management to be crucial, especially when interchanging
        finding aids with other institutions and consortia (hard-coding a specific path or URL in
        every entity declaration is onerous). If new tools become available for either authoring or
        publishing, which require XML and which we would find valuable, or if stronger entity
        managment is included in a future version of the XML standard, we would like to switch over.</p>
      <p>All of our raw SGML files may be accessed in the SGML section of the Online Archive of
        California: <a href="http://www.oac.cdlib.org/sgml">http://www.oac.cdlib.org/sgml</a>
      </p>
    </encoding>
    <contact>Lynne Grigsby-Standfill, Head Digital Publishing Group UC Library <a href="mailto:lgrigsby@library..edu">lgrigsby@library..edu</a> Alvin Pollock, Lead Programmer
      Digital Publishing Group <a href="mailto:apollock@library..edu">apollock@library..edu</a>
    </contact>
    <rlg>Yes.</rlg>
  </entry>
