Standards for Archival Description: A Handbook
Home | Table of Contents


CHAPTER 7: CODES


One of the primary motivations behind the development and use of coding schemes is the compression of data. A two-character code like DC for the District of Columbia saves time (and therefore money) during data entry, reducing the number of keystrokes for input from twenty to two. Data storage requirements are also reduced proportionally. The programmer can use a fixed length field knowing that all state codes will occupy just two spaces.

Codes also reduce error rates during data entry and retrieval, establishing unambiguous and consistent representations for people, places, or things that may go by different names or spellings. One operator's District of Columbia may be another's Washington, DC, but both can use the code DC.

Tannehill and Husbands identify several early efforts in the library field to record information in abbreviated formats, including the Dewey Decimal Classification System (1873), Cutter's Expansive Classification of 1891, the Cutter Tables of 1899-1901, the Library of Congress Classification System of 1899-1920, and the National Union Catalog code (1932).1

This chapter does not include these classification systems, however, choosing instead a narrower definition of what constitutes a code. Ideally, coding standards should incorporate ten characteristics, enumerated in 1975 by the National Bureau of Standards and Accredited Standards Committee X3: uniqueness, expandability, conciseness, uniform size and format, simplicity, versatility, sortability, stability, meaningfulness, and operability.2

Full descriptions are given here only for the codes most likely to be encountered by archivists during the description process. In most cases, these are codes specified or allowed by automated bibliographic systems that incorporate the description of archival materials and, more specifically, by USMARC-based systems. Depending on the types of data being encoded, these standards may or may not be the same coding schemes as those used in other, nonbibliographic information systems.

In some cases, librarians have adopted widely recognized national and international standards. The variable control fields 005 in each of the USMARC formats, for instance, call for encoding dates and times according to the specifications included in ANSI X3.30 and X3.43, respectively, which are also used in many other applications.

In other cases, however, the U.S. library community has found existing standards inadequate for bibliographic use. Geographic codes seem especially numerous (some 70 separate codes are known to exist) and problematical.3 Tannehill and Husbands indicated in 1982 that the maintenance agency for ISO 3166 was actively promoting the use of ISO country codes by bibliographic systems.4 But in 1991 Gredley and Hopkinson reported that although many MARC-based systems in other countries employed the ISO 3166 codes, the standard had not found the same degree of acceptance in the U.S.5 While Z39.27-1984, the U.S. national standard for country codes, was withdrawn by NISO members in 1990 in favor of the international standard, the widely used USMARC Code List for Countries, developed and maintained by the Library of Congress, is not in agreement with ISO 3166.

Another international standard finding little use among library systems, even internationally, is ISO 639:1988 containing language codes. The ISO codes cover only the more common languages and therefore are not comprehensive enough for bibliographic use. Most MARC systems worldwide use instead the list of language codes developed by the Library of Congress and published in USMARC Code List for Languages, also adopted as a U.S. national standard by NISO in Z39.53-1987.6

A somewhat different manifestation of coding can be seen in the character-level codes specified in ASCII (the American National Standard Code for Information Interchange), arguably the most basic of data exchange standards. ASCII is a classic standard in the sense that its use is virtually invisible to the user. While most computer systems rely on ASCII to store and communicate data, the user only becomes aware of it when the standard is missing or misapplied: when the screen image is a scatter of indecipherable symbols or the data sent into or out of a computer is hopelessly garbled.

Basic ASCII code is found in ANSI X3.4-1986 and is used broadly regardless of the application. The international equivalent of ASCII is found in ISO 646:1983. Variations do occur in the use of the numerous "character set extensions" that push the code beyond its original 128 characters. The methodology for creating extended character sets is defined in ANSI X3.41 and its international counterpart, ISO 2022.7

The U.S. library community developed its own extension set to accommodate bibliographic needs, known as "ALA Extended ASCII," in the late 1960s. This code evolved into today's ANSEL (Extended Latin Alphabet Coded Character Set for Bibliographic Use), adopted in large part as ANSI Z39.47-1985. Unfortunately, Crawford notes that significant variations in practice exist among the major bibliographic systems in implementing ANSEL specifications.8 Z39.47 is now being revised.

A number of extended character sets also exist for representing nonroman characters in ASCII. Z39.64-1989, East Asian Character Code Set for Bibliographic Use, was developed by the Research Libraries Group, Inc., to establish a structure for Chinese, Japanese, and Korean scripts. ISO standards include those for Cyrillic (ISO 5427), Greek (ISO 5428) and African (ISO 6438) characters. Crawford rightly notes that these standards may only be an interim solution.

Two efforts are now underway to define a comprehensive set of codes that would cover "the principal writing systems of the world."9 The Research Libraries Group (RLG) and a consortium of computer and software manufacturers are developing "Unicode" which uses 16 bytes (instead of extended-ASCII's eight bytes) to provide a single set of unique codes for characters in all known languages.10 Also, the Joint Technical Committee (JTC1) of ISO and the International Electrotechnical Commission (IEC) has developed Draft Information Standard, ISO/IEC DIS 10646, Universal Character Set (UCS), which was unsuccessfully balloted for the first time in 1991. The two groups are now attempting to substantially merge the two efforts, incorporating the Unicode character repertoire into ISO/IEC DIS 10646 with a number of additions.11

Library catalogers with computer systems that cannot process these extended character sets must rely on romanization and transliteration for representing nonroman characters. Standards for romanization are described in Chapter 8 of this handbook.

A large number of other codes exist and are actively used in a wide variety of nonbibliographic applications. A broader awareness of codes used in other types of information systems is certainly necessary for archivists who accession electronic records. In fact, archivists may find themselves consulting these standards with increasing frequency as they attempt to clean up or even reconstruct documentation for automated records, a process that is essentially descriptive in nature. Those archivists in a position to influence the design of new electronic records systems may want to encourage the use of standard codes where appropriate. If each organization or agency creating archivally significant data used standard codes, long-term research potential would be enhanced significantly because they provide ready means to link data about common topics from disparate sources.

The "Also of Interest" section at the end of the chapter is meant to introduce archivists to these other standards; notes accompanying the entries attempt to clarify how these codes relate to each other and to the codes used by bibliographic information systems.

Further reading

Aliprand, Joan M. "Nonroman Scripts in the Bibliographic Environment." Information Technology and Libraries 11 (June 1992): 105-119.

Barry, Randall K. "The Standards Dilemma of Character Sets." Information Standards Quarterly 3 (April 1991): 8-16.

Clews, J. Special Characters in Libraries: the Development of Character Set Standards. British Library Research and Development Report 5962. Boston Spa: British Library Document Supply Centre, 1988.

"The Importance of Recording Standard Numbers in Local Library Data Bases." Library Systems Newsletter 9:12 (December 1989): 91-93.

Kansky, Karel J. "Geographical Codes." In Encyclopedia of Library and Information Science, vol. 9, 222-236. New York: Marcel Dekker, 1973.

Mooers, Calvin N. "Codes and Coding." In Encyclopedia of Library and Information Science, vol. 5, 251-260. New York: Marcel Dekker, 1971.

National Bureau of Standards. Catalog of Widely Used Code Sets. Federal Information Processing Standards Publication 19-1 (FIPS Pub 19-1). Gaithersburg, MD: National Bureau of Standards, 1985.

National Bureau of Standards. Guide for the Development, Implementation, and Maintenance of Standards for the Representation of Computer Processed Data Elements. Federal Information Processing Standards Publication 45 (FIPS Pub 45). Gaithersburg, MD: NBS, 1976.

Tannehill, Robert S., and Charles W. Husbands. "Standards and Bibliographic Data Representation." Library Trends 31 (Fall 1982): 283-313.

Thibodeau, Sharon. "External Technical Standards for Data Contents and Data Values: Prospects for Adoption by the Archival Community." American Archivist 53 (Winter 1990): 94-98.


ANSI X3.30-1985 (R1991)
Representation for Calendar Date and Ordinal Date
for Information Interchange

1985, reaffirmed 1991.
Paper (2 p.).
Available from ANSI. $8.00.


Development, approval, and maintenance:

Developed by American National Standards Institute, ASC X3: Accredited Standards Committee for Information Processing Systems (revision of X3.30-1971). Approved by ANSI on 30 July 1985 and reaffirmed in 1991. Maintenance is assigned to ASC X3 whose secretariat is the Computer and Business Equipment Manufacturers Association (CBEMA).

Scope and structure:

This brief standard contains explicit instructions for representing calendar dates (year, month, day) and ordinal dates (year and day of year, sometimes also called "Julian date"). The calendar date is usually expressed yyyymmdd by using four digits for the year, followed by two for the month and two for the day of the month. The year may be shortened to two digits if the century can be inferred or one digit if the decade is also known. Ordinal dates take the form yyyyddd, yyddd, or yddd in which the day of the year is expressed using three digits (001 to 365, or to 366 in a leap year).

Examples: 4 July 1991 would be expressed as one of the following calendar date codes: 19910704, 910704, or 10704. As an ordinal date, it would be written 1991185, 91185, or 1185.

Related standards:

X3.30 and X3.43 are designed to be used together to standardize representations of dates and times. Adopted as a U.S. government standard in FIPS Pub 4-1, Representation for Calendar Date and Ordinal Date for Information Interchange (1988). The international equivalent is contained in ISO 8601:1988.

Archival applications:

Specified for use in field 005 of the USMARC formats to indicate date of last transaction (in the yyyymmdd form). Many other standards and applications that need to accommodate storage of date information use X3.30 coding formats, so that repositories accessioning electronic records will also encounter wide-spread use of this standard.

References:

Crawford, Walt. Technical Standards: An Introduction for Librarians, 1st ed. White Plains, NY: Knowledge Industry Publications, 1986, 186-187 (discussing the 1971 version of X3.30).


ANSI X3.43-1986 (R1992)
Representation of Local Time of the Day
for Information Interchange

1986, reaffirmed 1992.
Paper (8 p.).
Available from ANSI. $10.00.


Development, approval, and maintenance:

Developed by the American National Standards Institute, ASC X3: Accredited Standards Committee for Information Processing Systems, Technical Committee X3L8 on Representation of Data Elements (revision of X3.43-1977). Approved by ANSI on 23 June 1986 and reaffirmed in 1992. Maintenance is assigned to ASC X3 whose secretariat is the Computer and Business Equipment Manufacturers Association (CBEMA).

Scope and structure:

This brief standard provides explicit instructions for representing the time of day in digital form using either the 12- or 24-hour clock. While it recommends using the 24-hour system, especially in international applications, it can accommodate 12-hour designations by adding an A or P for am or pm. The standard can be used for any level of specificity down to hours, minutes, seconds, and decimal fractions of a second. While not required, colons can be added between each of these elements to improve human readability. Example: The time 2:16 and 35 seconds would be encoded as 141635 (24-hour system) or 021635P (12-hour system). Midnight is represented as 000000 in accordance with the ISO standard.

Related standards:

X3.30 and X3.43 are designed to be used together to standardize representations of dates and times. Adopted as a U.S. government standard in FIPS Pub 58-1, Representations of Local Time of the Day for Information Interchange (1988). The international equivalent is contained in ISO 8601:1988.

Archival applications:

Specified for use in Field 005 of the USMARC formats to indicate time of last transaction (in hhmmss.f form). Many other standards and applications that need to accommodate storage of date information use X3.43 coding formats, so that repositories accessioning electronic records will also encounter wide-spread use of this standard in those files.

References:

Crawford, Walt. Technical Standards: An Introduction for Librarians, 1st ed. White Plains, NY: Knowledge Industry Publications, 1986, 194-195.


ISO 8601:1988
Data elements and interchange formats--Information
interchange--Representation of dates and times

1988, amended 1991.
Paper (17 p.).
See "Availability" below.


Development, approval, and maintenance:

Developed by the International Organization for Standardization, Technical Committee 154: Documents and data elements in administration, commerce and industry. Approved by ISO and maintained by ISO TC 154. Technical Corrigendum 1 was approved in 1991.

Scope and structure:

This extensive standard combines the formats for encoding dates and times that were previously contained in five separate international standards: ISO 2014, 2015, 2711, 3307, and 4031. Unlike the equivalent U.S. national standard, it does not include a provision for Julian dates. It does, however, provide for the numbering of weeks of the year (which always begin on Monday) for which no U.S. equivalent exists.

Related standards:

Corresponding U.S. national standards are ANSI X3.30-1985 (dates) and ANSI X3.43-1986 (times).

Archival applications:

Archivists in the United States will more often use the ANSI standards because they are specified for use in the USMARC Formats.

Publication format and availability:

Available from ANSI. $37.00. Also published in ISO Standards Handbook 1: Documentation and Information, 3rd ed. (Geneva, Switzerland: International Organization for Standardization, 1988), 845-859.


ANSI/NISO/ISO 3166:1988
Codes for the representation of names of countries

1988.
Paper (53 p.).
ISBN 0-88738-937-6.
Available from NISO. $50.00.


Development, approval, and maintenance:

Developed by the International Organization for Standardization. Maintained by Deutsches Institut für Normung (DIN) in Berlin. Code additions or alterations reflecting political changes are made only after the United Nations formally notifies the maintenance agency. ISO 3166 has been adopted in more than 30 national or regional standards. It was approved by the National Information Standards Organization and the American National Standards Institute as a U.S. national standard in 1991.

Scope and structure:

The 1988 version of this standard contains three types of codes: both 2- and 3-character alphabetic codes (the latter may be familiar because it is the widely used "Road Vehicle Code" seen on automobile stickers to designate country of origin) and a 3-digit numeric code. ISO 3166 "provides code elements for all separate territories on the earth--not only more than 170 independent states, but another 50 entities geographically separated from their mother countries."12

Related standards:

ISO 3166 provides the basis for a number of other international codes, including those for currency (ISO 4217:1990) and the UN Location Code. It is also specified in a number of business and library applications worldwide, including UNIMARC, electronic message handling, and customs documents.

The NISO standard, ANSI Z39.27-1984, was withdrawn in 1990 in favor of using ISO 3166 country codes.

References:

Tannehill, Robert S., and Charles W. Husbands. "Standards and Bibliographic Data Representation." Library Trends 31 (Fall 1982): 294-296.


USMARC Code List for Countries

Online version

March 1988.
Paper (viii + 44 p.).
ISBN 0-8444-0606-6. LC 88-8845.
Available from LC CDS. $15.00.


Development, approval, and maintenance:

Originally compiled in collaboration with the LC MARC Pilot Project participants, the National Library of Medicine, and the National Agricultural Library. Now issued by the Network Development and MARC Standards Office. The Library of Congress is the maintenance agency for this code list.

Scope and structure:

"This document contains a list of places and their associated two- or three-character lowercase alphabetic codes. This list includes individual codes for presently existing national entities, states of the United States, provinces and territories of Canada, divisions of the United Kingdom, republics of the Soviet Union (a revised list for the separate republics was issued in February 1992), and internationally recognized dependencies. The purpose of this list is to allow designation of the places associated with an item by codes in the USMARC record for that item. The list contains 337 discrete codes, of which 23 are discontinued codes no longer valid for use."

Related standards:

Tannehill and Husbands note that the "country of publication" codes in MARC are the precursors of the 2-character country codes specified in ISO 3166 (which also contains provisions for 3-character codes and 3-digit numeric codes). The USMARC list goes further, however, containing representations for the states of the U.S., the provinces of Canada, and the republics of the former Soviet Union.

FIPS Pub 10-2, Countries, Dependencies, and Areas of Special Sovereignty, was consulted during the development of this list.

References:

Tannehill, Robert S., and Charles W. Husbands. "Standards and Bibliographic Data Representation." Library Trends 31:2 (Fall 1982): 295.


USMARC Code List for Geographic Areas

Online version

March 1988.
Paper (ix + 53 p.).
ISBN 0-8444-0607-4. LC 88-600130.
Available from LC CDS. $15.00.


Development, approval, and maintenance:

The list was compiled through a collaboration of three units in the Library of Congress: Research Services, the Automated Systems Office, and the Subject Cataloging Division. It is issued by the Network Development and MARC Standards Office. The Library of Congress is the maintenance agency for this list.

Scope and structure:

"This document contains a list of places and their associated one- to seven-character codes. The list includes separate codes for countries, first order political divisions of some countries, regions, and geographic features. The purpose of this list is to allow places reflected in the subject headings assigned to an item to be designated by codes in the USMARC record for that item. The list contains 526 discrete codes, of which 29 are discontinued codes no longer valid for use."

Archival applications:

Available for use in field 043 of the AMC, VM, and other USMARC bibliographic formats to indicate geographic areas relevant to the subject content of materials being cataloged (except for maps).


ANSI Z39.53-1987
Codes for the Representation of Languages
for Information Interchange

1987.
Paper (16 p.).
ISBN 0-88738-955-4. ISSN 8756-0860.
Available from NISO. $30.00.


Development, approval, and maintenance:

This standard was prepared by the National Information Standards Organization (Z39), Standards Committee C, "Language Codes." The committee based its work on the list of MARC language codes developed by LC in cooperation with the National Agricultural Library and the National Library of Medicine, currently published as USMARC Code List for Languages. Processed by NISO and approved by ANSI on 29 June 1987. The Library of Congress is designated as the maintenance agency for this standard.

Scope and structure:

"Language codes are defined to enable libraries, information services, and publishers to indicate language in the exchange of information. A list of three-character language codes is provided."

"Each language code is accompanied by a descriptor. Descriptors generally are based on the form of language name found in Library of Congress Subject Headings. . . . The list of language codes is presented in two versions: (1) In alphabetical order by language code [and] (2) In alphabetical order by descriptor."

Related standards:

As noted above, this standard was developed in conjunction with the USMARC Code List for Languages; Part II of that larger publication, in fact, contains lists identical to those in this volume. The codes were based on the form of headings found in LCSH. The international counterpart is ISO 639:1988, but it lists only the most common languages. As a result, many bibliographic systems use the USMARC Code List for Languages instead.

Archival applications:

See notes under USMARC Code List for Languages.

References:

Crawford, Walt. Technical Standards: An Introduction for Librarians. 2nd ed. Boston: G.K. Hall, 1991, 266-67.


USMARC Code List for Languages

Online version

March 1989.
Paper (154 p.).
ISBN 0-8444-0656-2. LC 87-600198.
Available from LC CDS. $20.00.


Development, approval, and maintenance:

Originally compiled in collaboration with the LC MARC Pilot Project participants, the National Library of Medicine, the National Agricultural Library, and the Defense Language Institute. The list was revised by a committee of NISO. It is issued by the Network Development and MARC Standards Office, Library of Congress. The Library of Congress is the maintenance agency for this list and for ANSI Z39.53.

Scope and structure:

"This document contains a list of languages and their associated three-character alphabetic codes. The purpose of this list is to allow the designation of the language or languages in USMARC records. The list contains 372 discrete codes, of which 104 are used as group codes."

Part I, "Name Sequence," is an extensive list, in alphabetical order, of languages and language group names. For each language or language group, the appropriate code is identified. Entries for individual languages show preferred and variant names and the name of the language group to which it is assigned. Part II lists the codes themselves in alphabetical order.

Related standards:

The code list in ANSI Z39.53, Codes for the Representation of Languages for Information Interchange, is identical to the list of codes and associated language or language group names contained in Part II of this publication. Compilers of original version of this code list also consulted the language list of the Center for Applied Linguistics and the Library of Congress Subject Headings.

Archival applications:

Specified for use in USMARC formats field 008/35-37 for indicating language of item being cataloged. Also available for field 041 when field 008/35-37 is "insufficient to convey full information for a multilingual item or a translation." Field 242 subfield y requires a code for the language of a translated title.

References:

Gredley, Ellen, and Alan Hopkinson. Exchanging Bibliographic Data: MARC and Other International Formats. Chicago: American Library Association, 1990, 57-58.


ISO 639:1988
Code for the representation of names of languages

1988.
Paper (17 p.).
See "Availability" below.


Development, approval, and maintenance:

International Organization for Standardization, Technical Committee 37: Terminology (principles and co-ordination). Approved by ISO. Registration authority is assigned to the International Information Centre for Terminology (INFOTERM), c/o Österreichisches Normungsinstitut, Heinestrasse 38, Postfach 130, A-1021 Wien 2, Austria.

Scope and structure:

This standard "provides a code for the presentation of names of languages. The symbols were devised primarily for use in terminology, lexicography and linguistics, but they may be used for any application requiring the expression of languages in coded form. It also includes guidance on the use of language symbols in some of these applications." Because this standard only covers the most common languages, many bibliographic systems use the USMARC Code List for Languages instead.

Related standards:

The U.S. national standard language codes are contained in ANSI Z39.53 and the more extensive USMARC Code List for Languages.

Archival applications:

Archivists in the U.S. will most often use the U.S. national standard rather than this international one for languages.

Publication format and availability:

Available from ANSI. $43.00. Also published in ISO Standards Handbook 1: Documentation and information, 3rd ed. (Geneva, Switzerland: International Organization for Standardization, 1988), 215-232.

References:

Gredley, Ellen, and Alan Hopkinson. Exchanging Bibliographic Data: MARC and Other International Formats. Chicago: American Library Association, 1990, 57-58.


ANSI X3.4-1986 (R1992)
American National Standard Code for
Information Interchange (ASCII)

1986, reaffirmed 1992.
Paper (27 p.).
Available from ANSI. $17.00.


Development, approval, and maintenance:

Developed by the American National Standards Institute, ASC X3: Accredited Standards Committee for Information Processing Systems, Subcommittee X3L2 on Character Sets and Codes. Approved by ANSI on 26 March 1986 and reaffirmed in 1992. Maintenance is assigned to ASC X3 whose secretariat is the Computer and Business Equipment Manufacturers Association (CBEMA).

Scope and structure:

ASCII is arguably the most basic of data exchange standards. It specifies the codes for a set of 128 characters that can be used in virtually every computer application: control characters and graphic characters, such as letters, digits, and symbols. User demand for additional characters has resulted in the development of a number of extension sets. The most commonly used in U.S. bibliographic systems is the set first developed by LC and ALA for the MARC distribution service in 1969, now known as ANSEL (Z39.47, see page ). These codes were adopted in large part by Z39.47, but local and system-specific variations occur in actual use of the codes.

USMARC Specifications for Record Structure, Character Sets, Tapes (1990 ed., see p. 63) provides detailed information on the use of ASCII and ANSEL for USMARC processing. It also includes tables for the characters unique to "ALA Extended ASCII" used by USMARC but not incorporated in these standards.

Related standards:

Revision of ANSI X3.4-1977; the original version of X3.4 was approved in 1968.

This standard was developed in parallel with its international counterpart ISO 646:1983, Information Processing--ISO 7-bit coded character set for information interchange. Differences in the U.S. (ANSI) version resulted from efforts to "adopt more customary U.S. terminology and to reduce ambiguity." The current international equivalent is ISO/IEC 646:1991.

FIPS Pub 1-2, Code for Information Interchange, Its Representations, Subsets, and Extensions, adopts and consolidates ANSI X3.4, X3.32, and X3.41.

Specifications and naming conventions for developing extensions of the basic X3.4 character set are contained in X3.41-1990, Code Extension Techniques for Use with the 7-Bit Coded Character Set of ASCII. Two extensions developed for use in U.S. bibliographic systems are ANSEL (Z39.47) and the East Asian Character Code for Bibliographic Use (Z39.64).

Many IBM-based computers have used EBCDIC (Extended Binary Coded Decimal Interchange Code) for internal operations, but standards for data exchange among different computer systems have focused on the use of ASCII.

References:

Crawford, Walt. Technical Standards: An Introduction for Librarians, 1st ed. White Plains, NY: Knowledge Industry Publications, 1986, 174-176 (on 1977 version).

Tannehill, Robert S., and Charles W. Husbands. "Standards and Bibliographic Data Representation." Library Trends 31 (Fall 1982): 296-300.


ANSI Z39.47-1985
Extended Latin Alphabet Coded Character Set
for Bibliographic Use
(ANSEL)

1985.
Paper (24 p.). ISBN 0-88738-959-7.
Available from NISO. $16.00.


Development, approval, and maintenance:

Developed by Subcommittee N on coded Character Sets for Bibliographic Information Interchange, American National Standards Committee on Library and Information Sciences and Related Publishing Practices, Z39, now the National Information Standards Organization (NISO). It was derived from "ALA Extended ASCII," which was first developed in the late 1960s. Approved by ANSI on 4 September 1984. A revised version was balloted in 1992. NISO maintains the standard.

Scope and structure:

"The standard establishes both the 7-bit and the 8-bit code values for the computer codes for characters used in bibliographic work when handling non-English items. The characters included in the codes have been selected because they are the ones needed to fully record bibliographic citations in many Latin alphabet languages and non-Latin languages transliterated into Latin alphabet characters.... The standard consists of code tables and a legend giving the name and an example for each of the extended Latin graphic characters."

The USMARC Specifications for Record Structure, Character Sets, Tapes provides detailed information on the use of ASCII and ANSEL for USMARC processing. It also includes tables for the characters unique to "ALA Extended ASCII" used by USMARC but not incorporated in these standards.

Related standards:

This standard extends the basic ASCII character set contained in X3.4 and was developed using specifications for creating ASCII extension sets expressed in ANSI X3.41. Although it adopts many of the same codes as ALA Extended ASCII, Crawford notes that implementation problems have hindered its acceptance.

References:

Crawford, Walt. Technical Standards: An Introduction for Librarians. 2nd ed. Boston: G.K. Hall, 1991, 255-257.

Tannehill, Robert S., and Charles W. Husbands. "Standards and Bibliographic Data Representation." Library Trends 31 (Fall 1982): 297-300.


ANSI/NISO Z39.64-1989
East Asian Character Code for Bibliographic Use
(EACC)

1989.
Paper (10 p.) plus 11 microfiche.
ISBN 0-88738-947-3.
Available from NISO. $40.00.


Development, approval, and maintenance:

Developed by the Research Libraries Group, Inc., and adopted by the National Information Standards Organization. Approved by ANSI on 16 January 1989. Maintenance responsibility is assigned to the Library of Congress. Questions about implementation and other requests are referred to the Descriptive Cataloging Division.

Scope and structure:

"Establishes a computer coding structure for characters in Chinese, Japanese, and Korean scripts. This standard establishes 3-byte code values in an 8-bit environment for a base set of characters, and provisions for expansion." It was based on the RLIN East Asian Character Code (REACC) first developed by RLG in the 1980s. Crawford considers this a significant improvement over previous systems that merely transliterated Chinese, Japanese, and Korean because the resulting entries were not always uniquely identifiable.

Related standards:

Extends the basic ASCII character set (in X3.4). RLG has begun an ambitious project in conjunction with major computer and software manufacturers to develop "Unicode" which would use 16 bits to encode the characters from all known languages.

Archival applications:

Because RLIN is used by an ever-widening number of archival repositories, this standard is likely to be important when Asian-language materials are being processed.

References:

Crawford, Walt. Technical Standards: An Introduction for Librarians. 2nd ed. Boston: G.K. Hall, 1991, 280-281.


USMARC Character Set for
Chinese, Japanese, and Korean

November 1986.
Looseleaf.
ISBN 0-8444-0548-5.
Available from LC CDS. $35.00.


Development, approval, and maintenance:

The character set was developed by the Research Libraries Group, Inc., for use in its Research Libraries Information Network (RLIN). Maintenance and updating for the LC publication is handled by the LC Network Development and MARC Standards Office.

Scope and structure:

"This publication contains the RLIN East Asian Character Code (REACC), a set of three-byte codes used to represent and store in machine-readable form all the Chinese, Japanese, and Korean characters used with the USMARC formats."

References:

"Chinese, Japanese, Korean Character Set." Cataloging Service Bulletin 37 (Summer 1987): 59.


Symbols of American Libraries
(NUC Codes)

14th ed., 1992.
Cloth.
Available from LC CDS. $28.00.


Development, approval, and maintenance:

In use since 1932, the National Union Catalog (NUC) codes are developed and maintained by the Library of Congress.

Scope and structure:

As Crawford noted in 1986, this "'nonstandard standard' is more widely used in the library field than any standard number except ISBN." As of June 1991, 23,820 distinct institutional codes had been assigned. They are used to identify libraries located in the U.S. and Canada in union catalogs, databases, and other composite sources. Complete names and addresses are also provided for each institution to be used for mailing purposes.

The NUC codes use both upper- and lower-case letters, which caused some problems especially in the early days of computing, and the number of characters has varied from two to at least nine. New guidelines have shortened the number of characters in a valid institutional code to eight. The promotional literature for the new volume states that it is "structured for the ILL (interlibrary loan) and LSP (Linked Systems Project) environments."

Archival applications:

Specified in USMARC field 040 to designate the organization creating the original bibliographic record. Some archival repositories have their own NUC code; many others are part of institutions that have one assigned to the larger organization as a whole. Any repository can write to LC to obtain a code.

References:

Crawford, Walt. Technical Standards: An Introduction for Librarians, 1st ed. White Plains, NY: Knowledge Industry Publications, 1986, 60.

Tannehill, Robert S., and Charles W. Husbands. "Standards and Bibliographic Data Representation." Library Trends 31 (Fall 1982): 295-296.


USMARC Code List for
Relators, Sources, Description Conventions

Online version

1990.
Paper (vi + 26 p.).
ISBN 0-8444-0708-9. LC 90-013279.
Available from LC CDS. $15.00.


Development, approval, and maintenance:

The lists are compiled and maintained by the Library of Congress, Network Development and MARC Standards Office.

Scope and structure:

"This document contains several lists of codes intended for use in USMARC bibliographic records and USMARC authority records. The lists contain a total of 205 discrete codes, two of which are obsolete. Some codes appear on more than one list."

The document contains six sections: Relator Codes contains codes used to establish a relationship between a name and a work (e.g., annotator, compiler); Subject Category Code Sources contains codes for thesauri and other subject lists that are acceptable for use in USMARC cataloging in fields 072 and 073; Classification Sources contains codes for sources of classification schemes (of limited archival use); Subject/Index Term Sources contains codes for sources of subject headings or index terms for use in bibliographic record fields 600-657 and authority record field 040; Foreign MARC Sources lists codes designating MARC systems in other countries in order to indicate source-format for non-USMARC entries (e.g., UKMARC, UNIMARC); and Description Convention Codes lists codes for works containing descriptive cataloging conventions that are not consistent with AACR rules or which apply AACR 2 to special forms of material (e.g., APPM, AMIM, GIHC).

The Relator Codes are described in more detail in Chapter 6.

Archival applications:

Will be used during archival description to indicate the sources of subject codes and terms and to identify cataloging conventions followed.


ALSO OF INTEREST

Dates and times

FIPS Pub 4-1. Representation for Calendar Date and Ordinal Date for Information Interchange. 1988. National Institute of Standards and Technology. Paper. Available from NTIS. $12.50.

Adopts ANSI X3.30-1985 which contains actual specifications for representing dates (specifications are not included in FIPS Pub 4-1).

FIPS Pub 58-1. Representations of Local Time of the Day for Information Interchange. 1988. National Institute of Standards and Technology. Paper. Available from NTIS. $17.50.

Adopts ANSI X3.43-1986 which contains actual specifications for representing time (specifications are not included in FIPS Pub 58-1).

Geographical locations

Countries

ANSI Z39.27-1984 [withdrawn]. Structure for the Representation of Names of Countries, Dependencies, and Areas of Special Sovereignty for Information Interchange. Paper. Out of print.

NISO voted in 1990 to withdraw this standard. It is replaced by a domestic adoption of ISO 3166, see ANSI/NISO/ISO 3166:1988.

FIPS Pub 104-1. ANS Codes for the Representation of Names of Countries, Dependencies, and Areas of Special Sovereignty for Information Interchange. 1986. National Institute of Standards and Technology. Paper. Available from NTIS. $17.50.

Implements Z39.27-1974 (which was withdrawn by NISO in 1990) which adopts with qualifications ISO 3166:1981 (which has since been revised). The codes in this list, developed by the National Bureau of Standards (the designated maintenance agency for ANSI Z39.27) from those included in the ISO list, are not the same as those used by USMARC for place of publication or production nor the U.S. Postal Service abbreviations for the outlying areas and trust territories of the United States.

States and their equivalents

ANSI X3.38-1988. Codes--Identification of the States, the District of Columbia, and the Outlying and Associated Areas of the United States for Information Interchange. Paper (2 p.). Available from ANSI. $12.00.

FIPS Pub 5-2. Codes for the Identification of the States, the District of Columbia and Outlying Areas of the United States, and Associated Areas. 1987. National Institute of Standards and Technology. Paper. Available from NTIS. $12.50.

Contains three code tables: (1) 2-character FIPS state codes that are identical to the U.S. Postal Service abbreviations; (2) FIPS codes for outlying areas of the U.S., freely associated states and trust territories that are identical to those in ISO 3166:1988, ANSI Z39.27, and FIPS Pub 104-1 (but not the abbreviations used by the Postal Service); and (3) 2-digit codes for outlying areas to be used as alternatives to county-equivalent codes in FIPS Pub 6-3.

Counties and their equivalents

ANSI X3.31-1988. Codes--Structure for the Identification of the Counties and County Equivalents of the United States and its Outlying and Associated Areas for Information Interchange. Paper (3 p.). Available from ANSI. $12.00.

Gives only the rules under which codes shall be established. Actual codes are contained in FIPS Pub 64.

FIPS Pub 6-4. Counties and Equivalent Entities of the United States, Its Possessions, and Associated Areas. 1990. National Institute of Standards and Technology. Paper. Available from NTIS. $17.50.

Implements and contains actual codes for ANSI X3.31-1988.

Populated places and other entities

ANSI X3.47-1988. Codes--Structure and Data Requirements for the Identification of Named Populated Places, Primary County Divisions, and Other Locational Entities of the United States and its Outlying and Associated Areas for Information Interchange. Paper (4 p.). Available from ANSI. $12.00.

Gives only the rules under which codes shall be established. Actual codes are contained in FIPS Pub 55DC-4.

FIPS Pub 55DC-4. Guideline: Codes for Named Populated Places, Primary County Divisions, and Other Locational Entities of the United States and Outlying Areas. 1987. National Institute of Standards and Technology. Available from NTIS in both paper and electronic form. Contact NTIS for format and pricing information.

Contains actual codes for "locational entities" in the U.S. that implement the structure specified in ANSI X3.47-1988. In addition to the 5-digit FIPS location code, the tables list each entity's state numeric and alpha codes (from X3.38), county codes (from FIPS 6-4), and Zip Code (from U.S. Postal Service Publication 65), among other geographic identifiers.

FIPS Pub 8-5. Metropolitan Statistical Areas (MSAs) (Including CMSAs, PMSAs, and NECMAs). 1984. National Institute of Standards and Technology. Paper. Available from NTIS. $19.50.

Adopts ANSI X3.43-1986 which contains actual specifications for representing time (specifications are not included in FIPS Pub 58-1).

Geographic point locations

 

ANSI X3.61-1986. Representation of Geographic Point Locations for Information Interchange. Paper (19 p.). Available from ANSI. $13.00.

Provides formats for representation of locations under three widely used systems: (1) latitude and longitude, (2) Universal Transverse Mercator, and (3) State Plane Coordinate System.

FIPS Pub 70-1. Representation of Geographic Point Locations for Information Interchange. 1986. National Institute of Standards and Technology. Paper. Available from NTIS. $22.50.

Adopts ANSI X3.61-1986 which contains actual specifications for representing geographic point locations (specifications are not included in FIPS Pub 70-1).

ISO 6709:1983. Standard representation of latitude, longitude, and altitude for geographic point locations. ISO/IEC Joint Technical Committee 1. Paper (3 p.). Available from ANSI. $22.00.

Business and individual identifiers

FIPS Pub 66. Standard Industrial Classification (SIC) Codes. 1979. National Institute of Standards and Technology. Paper. Available from NTIS. Contact NTIS for format and pricing information.

Contains only short titles; full titles are contained in the OMB SIC Manual (see below).

FIPS Pub 92. Standard Occupational Classification (SOC) Codes. 1983. National Institute of Standards and Technology. Paper. Available from NTIS. $17.50.

FIPS Pub 95. Codes for the Identification of Federal and Federally Assisted Organizations. 1982. National Institute of Standards and Technology. Paper. Available from NTIS. $20.50.

ISO 4217:1990. Codes for the representation of currencies and funds. 4th ed. ISO Technical Committee 68. Paper (27 p.). Available from ANSI. $58.00.

Standard Industrial Classification Manual (SIC Codes). 1987. U.S. Office of Management and Budget, Executive Office of the President. ISBN 0-16-004329-8. PB87-100012. S/N 041-001-00314-2. Cloth (705 p.). Available from the U.S. Government Printing Office. $24.00.

Standard Occupational Classification Manual. 1980. U.S. Department of Commerce, Office of Federal Statistical Policy and Standards. ISBN 0-16-018462-2. S/N 041-001-00351-7. Cloth (547 p.). Available from the U.S. Government Printing Office. $30.00.

ASCII and character set extensions

FIPS Pub 1-2. Code for Information Interchange, Its Representations, Subsets, and Extensions. 1984. National Institute of Standards and Technology. Paper. Available from NTIS. $12.50.

Adopts in whole X3.4-1986, 1977, X3.32-1990, and X3.41-1990, which together comprise the national standards for ASCII and its extended character sets. The codes themselves must be obtained from the X3 standards; they are not included in FIPS Pub 1-2.

ANSI X3.26-1980. Hollerith Punched Card Code. Paper (12 p.). Available from ANSI. $13.00.

Contains tables designating codes, plus an explanation of considerations underlying the design of the code and tables showing the Hollerith code's relationship to EBCDIC.

ANSI X3.41-1990. Code Extension Techniques for Use with the 7-Byte Coded Character Set of ASCII. Paper (32 p.). Available from ANSI. $23.00.

International equivalent is ISO 2022.

FIPS Pub 14-1. Hollerith Punched Card Code. 1980. National Institute of Standards and Technology. Paper. Available from NTIS. $12.50.

Adopts ANSI X3.26-1980 which contains the actual code (the code is not included in FIPS Pub 14-1).

ISO/IEC 646:1991. ISO 7-bit coded character set for information interchange. ISO/IEC Joint Technical Committee 1. Paper (15 p.). Available from ANSI. $39.00.

ISO 2022:1986. Information processing--ISO 7-bit and 8-bit coded character sets--Code extension techniques. ISO/IEC Joint Technical Committee 1. Paper (25 p.). Available from ANSI. $52.00.

U.S. national equivalent is ANSI X3.41.

ISO 5426:1983. Extension of the Latin alphabet coded character set for bibliographic information interchange. 2nd ed. ISO Technical Committee 46. Paper (6 p.). Available from ANSI. $25.00.

U.S. national equivalent is ANSI X3.47.

ISO 5427:1984. Extension of the Cyrillic alphabet coded character set for bibliographic information interchange. ISO Technical Committee 46. Paper (4 p.). Available from ANSI. $22.00.

ISO 5428:1984. Greek alphabet coded character set for bibliographic information interchange. ISO Technical Committee 46. Paper (5 p.). Available from ANSI. $25.00.

ISO 6438:1983. Documentation--African coded character set for bibliographic information interchange. ISO Technical Committee 46. Paper (6 p.). Available from ANSI. $25.00.


Footnotes

1 Robert S. Tannehill, Jr., and Charles W. Husbands, "Standards and Bibliographic Data Representation," Library Trends 31:2 (Fall 1982): 286.

2 National Bureau of Standards, Guide for the Development, Implementation, and Maintenance of Standards for the Representation of Computer Processed Data Elements (FIPS Pub 45) (Gaithersburg, MD, NBS: 1976): 28

3 "Country Code Standards: A Progress Report on ISO 3166," Information Standards Quarterly 3 (July 1991): 12.

4 Tannehill and Husbands, "Standards and Bibliographic Data Representation," p. 294-295.

5 Ellen Gredley and Alan Hopkinson, Exchanging Bibliographic Data: MARC and Other International Formats (Chicago: ALA, 1990): 57.

6 Gredley and Hopkinson, 57-58.

7 ANSI X3.41, Code Extension Techniques for Use with the 7-Bit Coded Character Set of ASCII; ISO 2022, Information processing--ISO 7-bit and 8-bit coded character sets--Code extension techniques. The European Computer Manufacturers' Association (ECMA) maintains a registry of character set extensions. See discussion in Joan M. Aliprand, "Nonroman Scripts in the Bibliographic Environment," Information Technology and Libraries 11 (June 1992): 109-110.

8 Crawford (1991): 256-257.

9 Joan M. Aliprand, "Nonroman Scripts in the Bibliographic Environment," p. 110.

10 Crawford (1991): 280-281; Joan M. Aliprand, "What is UNICODE?," LITA Newsletter (Fall 1991): 13-15; Kenneth M. Sheldon, "ASCII Goes Global," Byte (July 1991): 108-116. The Unicode standard itself appears in Unicode Consortium, The UNICODE Standard: Worldwide Character Encoding, Version 1.0 (Reading, MA: Addison-Wesley, 1991 (2 vols.).

11 Aliprand, "Nonroman Scripts in the Bibliographic Environment," p. 114.

12 "Country Code Standards: A Progress Report on ISO 3166," Information Standards Quarterly 3 (July 1991): 12.



Home | Table of Contents

Standards for Archival Description: A Handbook
© 1994 Society of American Archivists. All Rights Reserved.