|TO:||Angela Spinazze, CIMI Dublin Core Metadata Testbed Project Manager|
|FROM:||Bill Landis, Chair for the Society of American Archivists Technical Subcommittee on Descriptive Standards|
|RE:||Comments on CIMI's Guide to Best Practice: Dublin Core|
First, thank you for the opportunity to comment on this Guide. The attempt to codify in some way community-based "best practices" for something as loosely defined and delineated as Dublin Core metadata is an important step towards establishing standards for describing for resource discovery purposes data objects available on the World Wide Web. CIMI is to be commended for the effort that went in to this document, which goes a long way toward starting a conversation about the practicalities of utilizing Dublin Core metadata elements in real-life Web resources.
This Guide would benefit from a clearer explanation up front of how its authors conceive of the use of Dublin Core (DC) metadata elements by an institution. Will it be used as a format for original description? Or will it be used as a transition format that allows networked, web-based resource discovery access to original descriptions done originally in other formats and perhaps housed on systems that are not networked or not accessible via the Web? It seems like best practice guidance in these two scenarios would be very different, yet it isn't clear from reading these guidelines how they envision DC metadata elements being used.
These guidelines often seem to approach DC metadata element use as if it were an act of original description, but we wonder if many or most institutions would be using it in that way? The DC metadata scheme, in its generality, seems best suited as a minimal level of descriptive data that could be used for information interchange between more comprehensive descriptive systems. The guidelines appear to be aimed at someone creating a duplicate description for something already described elsewhere, when it seems more feasible that DC will be used to map existing descriptions to some simpler format for interchange rather than re-describing materials that are being made available in digital facsimile over the World Wide Web. The scalability of this latter approach (duplicating existing descriptions) seems questionable.
The 1:1 principle is an interesting and potentially useful one, however the current description of it in the Introduction to this Guide seems confusing. The principle "states that only one object, resource, or instantiation may be described within a single metadata record. Archival finding aids, which surely fall under the category of "cultural resources" that are in the scope of this Guide. Yet an archival finding aid encoded using Encoded Archival Description (EAD) represents a single metadata object that contains a hierarchy of metadata about an archival collection or fonds and its subdivisions, or series; their subdivisions, or files; and potentially their subdivisions, or items. The one description-to-one resource principle seems fine if it applies only to making the distinction between describing an original object and separately describing its digital facsimile. It seems confusing when applied to the description of multilevel digital information objects.
While several statements in the Introduction suggest that descriptive data is being extracted from existing descriptive tools or systems, the "Reality Checking" section of the Introduction gives guidance that seems implicitly to suggest a new act of description. As noted earlier, we question the feasibility of this approach, especially given the staffing and funding constraints under which most repositories of cultural heritage information operate. A clearer statement right up front of how the creators of this document envision it being utilized would be useful and would perhaps provide more clarity to individuals as they attempt to use this Guide.
Is the DC.TYPE list referred to in this section tied to some existing standardized list of types, or is DC attempting to start from scratch to define an authority for this? If the latter, why? Is there some other controlled vocabulary source that CIMI could recommend for filling in values for this field that is tied to broader standards than the DC group?
It would also be useful if examples here were couched in some explanatory text providing a rationale for creating a resource discovery record for some of the items used in the examples. The four minimal values recommendation seems like a good one.
There seems to be a lot of potential confusion here between data carrier and data format; the examples given do not help provide clarity. The CIMI interpretation of the DC standard definition also, in our opinion, muddies the water considerably. Is a "45 rpm vinyl record" really a data format? "3 1/4 inch floppy disk" and "CD-ROM" are other good examples of things that are carriers and not data formats. They do not provide all of the information necessary to access the data that forms the described resource.
It also seems that it would be useful for more guidance on the use of DC.FORMAT for descriptions of digital objects versus those for physical objects.
Examples here are very confusing because they don't provide guidance on where one decides to take this information from. This seems like a clear case where DC metadata field users would be better off using existing data content standards for their own or similar communities to decide what should serve as a title for a particular data object. The unstated suggestion in this particular section of the guidelines is that one is fishing around for a title from scratch, which flies in the face of what is suggested in the Introduction: that the Guide "provides direction on representing cultural heritage resources as currently captured and described in typical museum collection management systems."
This is the first of several places where the guidelines recommend duplication of information in several fields. This is difficult for us to understand or rationalize. It seems only to set up an end user trying to discover resources for multiple hits on the same DC record. If clarification of competing fields is an issue, then this seems like a structural weakness that the DC group needs to address. It should not, however, result in the conscious duplication of descriptive information recommended here.
Additionally, the examples provided seem very spotty. Some of them provide almost no "description" at all of the data object they represent. Are these excerpts from fuller Description field data? The CIMI guidelines emphasize that this field is "a rich source for indexable vocabulary," yet one wonders how useful the vocabulary indexed in the description "fixed in Berland's fluid and preserved in 80% alcohol" will be for resource discovery?
This section seems very confused and muddled. The Subject net is being cast so wide that one has to wonder how useful this information will be to end users. The guidelines cite AAT as an example of a recommended controlled vocabulary, yet AAT is not a subject vocabulary. The examples are filled with terms that we wonder whether any end user would associate with the concept of "subject." Much of this data is in fact descriptive and potentially useful for resource discovery, but not thrown in to a Subject bucket.
The advice pointing to AACR2 here for help in formatting names of creators is excellent and exemplifies an approach to and acknowledgement of existing descriptive standards tools that would be useful at other points throughout these guidelines.
The guideline "to express both the time period during which the resource was brought into being and the specific date when it was first cataloged or collected" seems to set up end users of these descriptions for confusion: how will they know which is which? If CIMI is proposing to use DC in an unqualified manner, then it seems like more context for the date should be recommended in the content of this field. The examples should also provide this information to end users.
CIMI recommends that "for IDs that are unique within an organization, DC.IDENTIFIER value should be preceded by an ID for the institution itself." This assumes that institutional IDs are unique in a global setting. Perhaps this scheme needs to include a recommendation that the country in which the institutional ID is unique be provided as well?
SOURCE and COVERAGE:
As mentioned previously under Description, we find the recommendations here to repeat data from other fields to be wrong-headed. Repetition of data is not a substitute for clarifying data structure. The bottom line of this repetition approach seems to be that the end user will suffer a degradation of retrieval efficiency, which we suggest would defeat the purpose of providing DC metadata for digital objects.
The Source examples are not clear. As in many of these fields, we think a clearer definition of the purpose of the field would raise questions about many of the examples provided. The clearer definition in this Guide may mean more up-front grappling with nebulous DC definitions and issues, but it would certainly result in the end in better data to offer to end users for the purpose of resource discovery.
The Coverage examples seem to muddle this field with Subject. Again, if the point of the DC exercise is to facilitate resource discovery, we question how this muddling will help potential resource discoverers.
Another example of what we see as a questionable recommendation to repeat data from other fields. Also, the rationale behind many of these examples is unclear. For example, does a gall wasp specimen require an "oak tree" and a "quercus"? This example seems to us clearly to violate the 1:1 principle stated in the introduction: is this a description of a gall wasp or a gall wasp specimen. If the desire is to provide information to the end user about the wasp's habitat, surely the Description field is a better, clearer place for this information than the Relation field?
CIMI's interpretation and guidelines seem clear. The examples do not provide useful information if looked at from an end user perspective. Examples of rights statements should be clear and concise, and an end user must be able to walk away from reading one knowing what he or she can and cannot do with the digital object. Examples like "Must state 'gift of Mrs. Arthur Dustin'" do not provide enough information to the end user.
Also, statements on access restrictions and statements concerning restrictions on what someone can do with a digital object should be clearly labeled so that end users know unambiguously what they are being told. This seems like an area in which a sloppy, poorly constructed rights statement may be more damaging than nothing at all. SAA TSDS Comments