eadLogo Help Pages - EAD in XML

This page is meant to address some basic practical questions about implementing EAD in XML. It assumes a general level of knowledge of EAD and SGML, such as might be gained from an EAD workshop or by marking up finding aids in EAD. While this page focuses on XML exclusively, it does present information also found in the EAD Working Group's EAD Application Guidelines and The EAD Cookbook by Michael Fox. I've tried to reference these works below, and you may want to check them for alternate presentations and, at several points, more complete information.

Both SGML and XML can get quite complex in their details, and solving practical problems often involves details. In answering specific questions, I've tried to concentrate on what needs to be done, but I've also felt it important to explain why it needs to be done. My hope is that this knowledge can help demystify XML (and SGML) and lead to greater understanding and comfort with these technologies.

If you have questions, comments, or suggestions regarding this page, please feel free to contact me.

David Ruddy
Digital Library and Information Technologies
Cornell University Library
dwr4@cornell.edu.

Last Updated: 28 July 2000


Table of Contents



What is XML?

XML (Extensible Markup Language) is a restricted subset of SGML.

SGML is a complex and very powerful document description, publication, exchange, and storage standard. Because of its complexity, however, several of its features are costly to implement in software, with the result being that relatively few SGML software packages exist and many of the existing ones are expensive. XML was born out of a desire to preserve the core strengths of SGML while making it easier to build compliant software tools, at the same time creating a document exchange standard for the internet--more useful and less constraining than HTML but lighter weight than SGML. XML designers removed the more complex (and thus under implemented and under utilized) features from the SGML specification, and simplified many others. In the end, XML will impose few restrictions on the majority of SGML markup schemes (though many of these tag sets will need to be reformulated).

The fact that XML is a subset of SGML has an important consequence. In general, valid XML documents are by definition valid SGML (of course, the converse is not necessarily true). If you alter your EAD SGML documents to make them XML compliant, they do not cease being SGML.

Why would I want to create EAD finding aids in XML rather than SGML, or convert my existing SGML finding aids to XML?

You may not want to. This depends a lot on your current system and environment. If you're begining to use EAD and have no process or system in place, and thus no prior investment in authoring or delivering SGML encoded guides, then working in XML is probably a good choice. Why? XML is the future. Before long, XML will offer more various, plentiful, and affordable software solutions. SGML software will likely remain static--either difficult to use or relatively expensive. Furthermore, XML was designed for the Web, and it opens up several options regarding the publication of encoded documents in that way. These options are bound to increase in the future.

On the other hand, if your current authoring, conversion, and delivery system is tied to SGML, and especially if that system represents a large investment, then you will want to consider how easily that system could be adjusted to handle XML data. If that cost is significant, you might want to hold off on XML until there are other compelling reasons to make a system change.

To put this another way, it's not the conversion of your existing documents from SGML into XML that will represent significant costs. This can be accomplished relatively easily. It's the adjustments needed in the surrounding production procedures and delivery system that could prove expensive. Remember also, since moving your data from SGML to XML is easy, it is not as if you are creating a larger and larger future cost for yourself by continuing to work in SGML. When the time comes that you want to change to an XML system, converting the data will not be difficult.

What's the difference between an EAD encoded document in SGML and one in XML?

There is very little difference. This is because the designers of EAD, anticipating XML, created EAD so that it is nearly XML compliant as it is.


How do I set things up to alter and validate EAD documents in XML--that is, how do I begin working with EAD in XML rather than in SGML?

This is not difficult. In describing what needs to be done, it is useful to think of the changes needed as falling into one of two areas of attention. One has to do with the EAD Document Type Definition (DTD) and the markup declarations that begin each document, and the other has to do with the actual text of your finding aid with all its descriptive markup.

When you open an EAD document, you can see the Document Type (DOCTYPE) Declaration at the top. This markup declaration is called the document prolog. Everything else in the document is an "instance" of the document type--in this case, the document type is "ead" and the instance of it is everything between (and including) the start and close <ead> tags. That's your encoded finding aid.

So we've got two problems: converting our document instances to XML, and altering the DTD and the DOCTYPE Declaration so that they conform to the XML standard. The next two questions address these two problems.


How do I convert my EAD SGML document instance (the finding aid and its descriptive markup) so that it's XML compliant?

This is relatively easy. There are just a few things that are allowed in EAD SGML markup that are not allowed in XML. Valid XML markup needs to comply to the following requirements:

If you ensure that these conditions are met, your EAD document instances will now be XML compliant. Of course, you don't need to worry about any of these conditions if you are using an XML authoring tool, such as XMetaL, to create your documents--the software takes care of all of that for you.

For additional information on moving your EAD guides from SGML to XML, see The Guidelines., pp. 134-35. The Cookbook, section 5, describes two pieces of software that help automate the above changes.


How do I alter the EAD DTD and the Document Type (DOCTYPE) Declaration so that I can use XML software with my EAD document?