Technical Speaking? Automatic Speech Recognition and Technical Writing

Writing through dictation has a long history, but it always required a human being on the other end to actually put words to paper. In his 1953 novel Second Foundation, Isaac Asimov eliminated the need for a second person; he described a machine that turned spoken words into print:

The salesman had said, "There is no other model as compact on the one hand and as adaptable on the other. It will spell and punctuate correctly according to the sense of the sentence. Naturally, it is a great aid to education since it encourages the user to employ careful enunciation and breathing in order to make sure of the correct spelling, to say nothing of demanding a proper and elegant delivery for correct punctuation."
Even then her father had tried to get one geared for type-print as if she were some dried-up, old-maid teacher (85; ch. 7).

When Asimov wrote the story, the idea was pure science fiction. He placed the machine in a galactic society, more than twenty thousand years in the future.

The future, however, has come to pass quite a bit sooner than Asimov projected. As computers become more powerful and developers find better techniques, automatic speech recognition is becoming more capable and less expensive. Some writers even question if the keyboard and mouse will soon be obsolete (Dieterich 30).

How can speech recognition help us? Speech recognition has potential as an aid for writers in general, and technical writers in particular. In this paper, I will discuss some of its implications for technical writing. First, I will provide an overview of the technology, its availability, and its future directions. Following that, I will discuss how speech recognition can affect writing and technical writing.

My aim in this paper is more to provide a sense of potential and possible applications than a complete buyer's guide to dictation packages. While I have included some current prices, they are general ranges only; I have made no recommendations on particular packages.

Speech Recognition Terms

Automatic speech recognition means many different things to different people. To provide a common starting point, this section defines the terms for classifying speech recognition. Continuous and discrete speech define how the operator can speak. Vocabulary determines a system's power. Speaker model limits how many people can use the system. Natural language processing is an (as yet) unmet goal.

The first distinction in speech recognition is between continuous and discrete speech. People generally talk in continuous speech, with no pauses between individual words. The human brain does an excellent job interpreting continuous speech; computers, however, have difficulty distinguishing individual words (though they are getting better). Current speech recognition programs usually work with discrete speech, where the speaker separates each word by a definite pause (usually on the order of one-tenth of a second).

Vocabulary size is another primary difference in speech recognition systems. The size may range from fewer than one hundred words to sixty thousand words. Exactly what words the vocabulary contains may prove more important than the number; vendors often offer special-purpose dictionaries (e.g., medical and legal terminology). As vocabularies increase, the systems become more complicated and more expensive.

Speaker model differentiates between single-user and multiple-user speech recognition. Single-user packages are optimized to work with one person's speech; the benefit to this loss of flexibility is increased accuracy. Multiple-user systems provide services to a wider range of users, at the cost of reduced accuracy.

Finally, the idea of natural language processing works its way into discussions of speech recognition. Speech recognition does nothing but turn speech into words; the computer does not know what the words mean in a general sense. Natural language processing, where computers can truly understand normal speech, is still a dream at this stage.

Speech Recognition Capabilities

With the terms defined, we can move on to outline just what speech recognition systems can do. Obviously, those terms play an important role in the system's capabilities. In addition, potential buyers likely have at least three criteria in mind when shopping for a system:

Included tools
Accuracy
Cost

The relative importance of these varies from person to person.

Most speech recognition systems provide both control tools and dictation packages. Control tools let users issue commands (e.g., "Open File") by voice, providing "hands free" operation of the computer. Dictation packages typically interface with a word processor to translate spoken words into text.

Accuracy is a strong criterion in how people judge systems. Since people unconsciously change their speaking styles, flexible packages should cope with changing inflections and accents. As I mentioned earlier, single-user packages usually provide more accurate translations.

Homonyms pose a special problem for speech recognition; for example, a computer has no a priori way of distinguishing "their" from "there." IBM's VoiceType, a subset of which will appear in its next release of OS/2, will actively go back and change previously-entered words when it determines the word from context (Nance 48). Other systems presumably will offer the same feature.

Prices primarily depend on the system's vocabulary; $400 will buy you an entry-level system with a 10,000 word vocabulary, while $700 to $1,000 brings the vocabulary up to around 30,000 words (Dieterich 30; Pepper 38).

Future Directions for Speech Recognition

What does the future hold? One consulting group estimates speech recognition systems will drop below $100 by 1998 (Dieterich, 1996). As systems become more sophisticated, they should do better at interpreting continuous speech from different people. This, coupled with larger vocabularies, should virtually eliminate any technical problems in their usage.

Speech Recognition and Writing

Given that speech recognition is available, affordable, and usable, how can technical writers benefit from it? The primary advantage comes from its "funneling" effect, which helps screen out distractions. Another advantage, particularly to technical writers, is the ease of translating conversations into hard copy. Taken together, these show speech recognition's potential.

To understand "funneling" better, take a moment to consider the writing process. While composing, you may find your attention distracted by previously written material; it's always tempting to go back and try to write just the right thing. Word processors, the darlings of the computer age, make this especially easy. You can polish your prose, and polish it again.

The trouble is, this can quickly damp out your train of thought. Many times, it's better to get the complete thought down, even if it isn't perfect. Otherwise, you will end up with pebbles instead of pearls; well-polished pebbles, but pebbles nonetheless.

According to Ronald Kellogg, "A funnel is an aid that channels the writer's attention into only one or two processes" (67). Eliminate the distractions, and you can keep your mind on what you're writing. Composition researchers have developed two computer-aided techniques, freewriting and invisible writing, to help eliminate distractions. While freewriting, the computer screen blinks if the writer pauses for more than a few seconds (Von Blum and Cohen 166). Invisible writing completely hides the text from the writer; you can try this yourself by simply turning down the brightness on your monitor (Marcus 120).

Speech recognition, then, seems especially well suited to the task of invisible writing. You can sit back and talk to the computer; even the bothersome task of typing words correctly is absent. By eliminating the temptation to backtrack and edit on the fly, speech recognition can smooth the flow of ideas onto paper.

The research cited, however, focuses on classroom composition. It's difficult to predict how well classroom results will carry over into nonacademic settings and longer documents. Technical writers, in particular, make heavier use of lists and figure references than composition students. Until researchers can analyze writers outside of classrooms, discussion of this sort can only be informed speculation.

The second benefit of speech recognition centers around information. Technical writers must accumulate notes, interviews, and other information sources before they can truly begin writing. Anything that can speed this process will make life that much easier. Instead of writing or talking to a tape recorder, the technical writer can dictate to a speech recognition program or take the program to a design meeting or interview.

Having a printed copy can benefit writers in several ways. First, unlike a tape recording, all parts of the material are easily accessible. Second, the information source can review the material for accuracy. Finally, the writer can easily move the text into a document.

Since the basic premise of writing is to put words on paper, it makes sense to get them on paper quickly. As an aid to this process, speech recognition is definitely an aid to writing.

Technical Writers Using Dictation

As part of my research for this paper, I attempted to locate technical writers who are using or have used dictation packages in their work. I posted a query to the TECHWR-L mailing list, an E-mail list with around 2,300 subscribers (Ray 335); this query asked the following questions:

What dictation tool did you use?
Are you using it now? If not, when did you use it?
What did you like best about the tool?
What did you like least about the tool?
Do you feel dictation helped you in your writing?
Do you feel dictation changed your writing style or effectiveness?

Unfortunately, I received no responses from dictation package users. This query was by no means a statistically significant survey, but the complete lack of response leads me to believe dictation packages have not yet penetrated the writing market.

Dictation vs. Writing

The discussion up to now has taken for granted one key issue: Dictation and writing are two different ways of doing the same thing. But are they? Since they use two different media, dictation and writing may have fundamental differences. Research, however, has shown both methods equally effective. And even if the two do have differences, the speech model may have unrealized benefits for technical writers.

Speech and writing are fundamentally different; people seldom speak as they write, or write as they speak (unless they are preparing a written copy of a speech). One central distinction is the irreversibility of speech. Roland Barthes says, "A word cannot be retracted, except precisely by saying that one retracts it. . . . [I]t is ephemeral speech which is indelible, not monumental writing" (qtd. in Sommers 379). Writing allows revision and polish.

Dictation, however, is not speech. Rather, it is speech adapted to writing. Just as people generally speak and write differently, they speak and dictate differently. The flexibility of spoken language allows this with no penalty.

Or is there a penalty? The sociologist Marshall McLuhan once stated, "The medium is the message." In this view, the means by which a message travels must have an effect on the message itself. You can precisely describe the notes in a sonata, but the effect is very different from hearing it played. If McLuhan's probe is valid, then dictation cannot help but be different from writing.

McLuhan's ideas, unfortunately, do not lend themselves well to empirical testing. Researchers have, however, performed tests on the differences between dictation and composition. John Gould, John Conti, and Todd Hovanyecz examined people's performance on a simulated "listening typewriter." They found that, in terms of effectiveness and time spent, subjects performed at least as well dictating to the listening typewriter as they performed writing (305). This, then, is an argument against McLuhan's view.

Even if we assume that dictation must carry "speech-like" aspects, is this such a bad thing? Technical writers, in particular, may find some benefit in this. David Porush speculated on how technical writers can use mnemonic devices from oral tradition to enhance technical documents, especially task sequences (141). Much has been said about the benefits of conversational style in technical writing; dictation may make this style easier to attain.

Conclusions

Speech recognition is here, and it's getting better all the time. As it becomes more affordable, it will doubtless find its way into writers' hands. What use writers make of speech recognition will depend on how it increases (or decreases) their effectiveness. While dictation packages can serve as an aid in collecting information and in writing, the shift from written composition to dictation may shift the underlying messages.

Works Cited

Asimov, Isaac. Second Foundation. New York: Ballantine Books, 1953.

Dieterich, Rob. "I'll Talk to You Soon." Byte March 1996: 30.

Gould, John, John Conti, and Todd Hovanyecz. "Composing Letters with a Simulated Listening Typewriter." Communications of the ACM 26 (1983): 295-308.

Kellogg, Ronald T. "Idea Processors: Computer Aids for Planning and Composing Text." Computer Writing Environments: Theory, Research, and Design. Ed. Bruce Britton and Shawn Glynn. Hillsdale, New Jersey: Lawrence Erlbaum Associates, 1989. 57-92.

Marcus, Stephen. "Real-Time Gadgets with Feedback: Special Effects in Computer-Assisted Writing." The Computer in Composition Instruction: A Writer's Tool. Ed. William Wresch. Urbana, Illinois: National Council of Teachers of English, 1984. 120-130.

Nance, Barry. "You Talk, Warp Listens." Byte September 1996: 48.

Pepper, Jon. "New Programs Take Better Dictation." Byte September 1996: 38.

Porush, David. "What Homer Can Teach Technical Writers: The Mnemonic Value of Poetic Devices." Journal of Technical Writing and Communication 17 (1987): 129-143.

Ray, Eric. "TECHWR-L: A History and Case Study of a Profession-specific LISTSERV List." Technical Communication 43 (1996): 334-338.

Sommers, Nancy. "Revision Strategies of Student Writers and Experienced Adult Writers." College Composition and Communication 31 (1980): 378-388.

Von Blum, Ruth and Michael Cohen. "WANDAH: Writing-Aid AND Author's Helper." The Computer in Composition Instruction: A Writer's Tool. Ed. William Wresch. Urbana, Illinois: National Council of Teachers of English, 1984. 154-173.

Eric Willard's Home Page Cal Poly | CPE Department | EE Department Questions? Comments? Drop me a line.
Copyright 1997, 1998 by Eric Willard