SGML and Its Value to Technical Communication

by Steven Aoki
June 2, 1997
English 518 (Technical Communication Theory)--Dr. Om Bali
Cal Poly State University, San Luis Obispo

SGML stands for "Standard Generalized Markup Language," a computer language based on international standard ISO 8879:1986. The long campaign for this generalized markup standard began in 1967 with a speech in Ottowa by William Tunicliffe [1, p. 3]. The premise of SGML is that by labeling the text in a document according to a generic hierarchy, the document can be converted into any proprietary medium imaginable--including books, electronic files, CD-ROM, the World Wide Web, and other documents with specific company formats.

SGML allows companies to archive all their documents into one electronic database. Thus, documents only need to be edited one time--at the source--and then outputted in any medium on-demand through automated conversion. The database can also be searched by keyword or by hierarchical location of the information [2, p. 3].

Furthermore, the same SGML document can be programmed to revise itself based on the audience. For example, one SGML document can be outputted as both a classified military document and as a civilian document. The power of SGML lies in its flexibility and efficiency. Hence, SGML works best when dealing with voluminous quantities of information. [3, p.1].

Companies that convert to SGML would typically require their technical writers to learn it. This is because technical writers, being authorities on the documents, would most likely direct the development of the hierarchies. Furthermore, technical writers would most likely apply the structural design to the text because having a team figure it out later would be inefficient. Technical writers can best prepare for an SGML environment by training themselves to visualize structure instead of aesthetics, by learning to think in a modular fashion, and by overcoming complacency with the word processors they are familiar with. This paper will address two main topics: SGML and its application to technical writers.

Before SGML, the most efficient way to store data was through a database--and this was fine for discrete elements like names and statistics. But author Linda Alschuler argues that "...65-85 percent of the world's information that is today neither a number nor a piece of data but is part of a lump of text on a computer called a file or a document" [4, p. 2]. Fortunately, such "lumps" can be managed discretely through hierarchies in SGML. An example of a hierarchy is an internal memorandum. Conceptually, an internal memo contains a masthead, a head, a body, and a closing. The head contains the recipient, the sender, the date, and a subject. The body contains paragraphs, which in turn contain such elements as text and emphasized words. The closing would have the signature and typist's initials, which could appear nowhere else but in the closing. No matter which company or which format, an internal memo features these same elements. Once these elements have been recognized, their appearances can be easily programmed and then outputted through automated conversion.

Two incidents skyrocketed SGML's popularity: the U.S. Department of Defense's adoption of it in the late 1980's, and the development of HTML (language of the World Wide Web) through the SGML language [4, p. 7]. SGML has been used for a wide variety of other fields, such as information management, financial analysis, criminal justice, maternal science, and book publishing [4, p. 1]. Companies that use SGML today include the U.S. Government Printing Office, Microsoft, Intel, Dow Jones, Kodak, RIA, Ericsson Inc., Columbia University Press, University of Chicago Press, Standard & Poor's, Adams & Hamilton, Sybase, Butterworth Legal Publishers, UCLA, Douglas Aircraft Company, and untold others [1,4,5]. In addition, software products like Adobe FrameMaker and Microsoft Word have come out with SGML components [1, pp. 5-6].

When companies convert to SGML, they typically reallocate most of the work from the final stages of production to the early stages of production. The companies also commonly subcontract during the design and outputting stages. Writing is rarely subcontracted because it is difficult to keep the subcontractor current with all phases of the data design [4, p. 322].

In the initial hierarchical designs, the technical writers must collaborate with the SGML analysts for a couple of main reasons. First, technical writers are the best qualified to create working hierarchies for the documents since they author the documents on an everyday basis. Second, smart managers would have technical writers actively participate in order to help the writers understand and accept the system better--after all, technical writers would be the ones working with it the most. As in-house staff become more familiar with SGML, subcontracting would probably diminish [4, p. 322].

Case studies have shown various instances where technical writers have had difficulty accepting SGML. One reason was their attachment to WYSISWYG (What You See Is What You Get) word processors. These are word processors that print out exactly what appears on the screen. In WYSIWYG word processors, writers typically see the text like this:

When working in an SGML structured editor, writers typically see this:

The lack of visual cues in SGML editors tends to bother writers [4, pp. 312-314]. Instead of seeing a title bolded and isolated on the screen, writers must imagine it mentally. WYSIWYG word processors deceive writers into categorizing information by typographic appearance rather than by their relationship to the hierarchy. SGML permits writers to return to their roots--to concentrate on structure rather than formatting decisions. Hence, writers no longer have to translate emphasized ideas into bold or italicized text--they can refer to such words as emphasized words.

SGML allows writers to express accurate, consistent meaning to words--with SGML applying the formatting later like an automated style guide. For example, writers' choices in WYSIWYG word processors are typically restricted to bolded, italicized, underlined, and capitalized words. But in SGML, writers are not constrained to such formats. They can label words for exactly what they are--be it emphasized words, glossary terms, index references, hyperlinks, or any other labels in the author's discretion. Instead of going through an entire document, formatting each idea on an individual basis, formatting can be done in one global step. This allows technical writers to focus on composition rather than aesthetics. Furthermore, one can apply a special formatting scheme to a specific output of the document. Ideally, all composition changes would be applied to the central, generic document--then outputted as various versions with their own designated formatting schemes. Thus, aesthetic concerns no longer convolute the rhetoric of the document.

Another complaint by technical writers is that SGML editors limit creativity by distracting them with structural considerations [4, pp. 310-311]. Yet writers visualize structure for their documents anyway, be it consciously or subconsciously. Unstructured writing would inevitably cause rewrites down the line. SGML editors merely remind the writer to follow the structure that they initiated, like a person following his or her New Year's resolution. And like a New Year's resolution, writers can alter the structure at anytime.

Other reasons for technical writers' rejections include apathy and the feeling that the company benefits while they do not. Yet these are problems associated with the company's strategy for converting to SGML, not in the SGML itself. The best policies the companies can take are to involve the writers, convincing them that it is their system, and evaluating them on quality rather than quantity [4, p. 311].

As an assignment for my digital media class at Cal Poly, we practiced creating hierarchies for documents. Our primary problem was "granularity"--what pieces of text should be considered elements, and how far such elements should be broken down. For example, users don't have to go so far as to treat nouns and verbs as elements. Some examples of recommended element distinctions include:

My experience in the digital media course has taught me the importance of thinking in a modular fashion--in other words, thinking ahead to possible standards. Whenever a technical writer encounter patches of text that may change from audience to audience, like social and cultural differences, money, phone numbers, or classified information--it is a good habit for the technical writer to immediately address it as one would address ambiguities in a style guide. The new elements can then be discussed and incorporated into the hierarchy. Text that may need cataloging later, like glossary and index items, should be pointed out as well. If such ideas are not addressed on the spot, people may forget or overlook them later.

Obviously, learning to apply SGML to a document requires experience and patience. But once applied, the benefits become apparent: on-demand documents in any medium; freedom from tedious formatting concerns; and the efficiency of editing one document, not many. Technical writers can best prepare themselves for an SGML environment by training themselves to picture structure instead of aesthetics, by learning to think in a modular fashion, and by overcoming complacency with their own word processors. After all, if technical writers persistently cling to the paradigms that comfort them, how would they ever improve?


  1. L. Aschuler and M. Walter, SGML '96: Celebrating the Tenth Anniversary of SGML, Seybold Report on Publishing Systems, 26:8, pp. 1-14, Dec. 30, 1996.
  2. E. Maler and J. E. Andaloussi, Developing SGML DTDs : from Text to Model to Markup, Prentice Hall PTR, Upper Saddle River, 1996.
  3. B. Marchal, SGML: Executive Summary, ~ben/sgml/executive.htm, Benelux Exchange Network Advertising & Promotion, 1997.
  4. L. Aschuler, ABCDÉSGML: A User's Guide to Structured Information, International Thomson Computer Press, Boston, 1995.
  5. B. E. Travis and D. C. Waldt, The SGML Implementation Guide: A Blueprint for SGML Migration, Springer-Verlag, Berlin, 1995.