Writing through dictation has a long history, but it always required a human being on the other end to actually put words to paper. In his 1953 novel Second Foundation, Isaac Asimov eliminated the need for a second person; he described a machine that turned spoken words into print:
The salesman had said, "There is no other model as compact on the one hand and as adaptable on the other. It will spell and punctuate correctly according to the sense of the sentence. Naturally, it is a great aid to education since it encourages the user to employ careful enunciation and breathing in order to make sure of the correct spelling, to say nothing of demanding a proper and elegant delivery for correct punctuation."When Asimov wrote the story, the idea was pure science fiction. He placed the machine in a galactic society, more than twenty thousand years in the future.Even then her father had tried to get one geared for type-print as if she were some dried-up, old-maid teacher (85; ch. 7).
The future, however, has come to pass quite a bit sooner than Asimov projected. As computers become more powerful and developers find better techniques, automatic speech recognition is becoming more capable and less expensive. Some writers even question if the keyboard and mouse will soon be obsolete (Dieterich 30).
How can speech recognition help us? Speech recognition has potential as an aid for writers in general, and technical writers in particular. In this paper, I will discuss some of its implications for technical writing. First, I will provide an overview of the technology, its availability, and its future directions. Following that, I will discuss how speech recognition can affect writing and technical writing.
My aim in this paper is more to provide a sense of potential and possible applications than a complete buyer's guide to dictation packages. While I have included some current prices, they are general ranges only; I have made no recommendations on particular packages.
The first distinction in speech recognition is between continuous and discrete speech. People generally talk in continuous speech, with no pauses between individual words. The human brain does an excellent job interpreting continuous speech; computers, however, have difficulty distinguishing individual words (though they are getting better). Current speech recognition programs usually work with discrete speech, where the speaker separates each word by a definite pause (usually on the order of one-tenth of a second).
Vocabulary size is another primary difference in speech recognition systems. The size may range from fewer than one hundred words to sixty thousand words. Exactly what words the vocabulary contains may prove more important than the number; vendors often offer special-purpose dictionaries (e.g., medical and legal terminology). As vocabularies increase, the systems become more complicated and more expensive.
Speaker model differentiates between single-user and multiple-user speech recognition. Single-user packages are optimized to work with one person's speech; the benefit to this loss of flexibility is increased accuracy. Multiple-user systems provide services to a wider range of users, at the cost of reduced accuracy.
Finally, the idea of natural language processing works its way into discussions of speech recognition. Speech recognition does nothing but turn speech into words; the computer does not know what the words mean in a general sense. Natural language processing, where computers can truly understand normal speech, is still a dream at this stage.
Most speech recognition systems provide both control tools and dictation packages. Control tools let users issue commands (e.g., "Open File") by voice, providing "hands free" operation of the computer. Dictation packages typically interface with a word processor to translate spoken words into text.
Accuracy is a strong criterion in how people judge systems. Since people unconsciously change their speaking styles, flexible packages should cope with changing inflections and accents. As I mentioned earlier, single-user packages usually provide more accurate translations.
Homonyms pose a special problem for speech recognition; for example, a computer has no a priori way of distinguishing "their" from "there." IBM's VoiceType, a subset of which will appear in its next release of OS/2, will actively go back and change previously-entered words when it determines the word from context (Nance 48). Other systems presumably will offer the same feature.
Prices primarily depend on the system's vocabulary; $400 will buy you an entry-level system with a 10,000 word vocabulary, while $700 to $1,000 brings the vocabulary up to around 30,000 words (Dieterich 30; Pepper 38).
To understand "funneling" better, take a moment to consider the writing process. While composing, you may find your attention distracted by previously written material; it's always tempting to go back and try to write just the right thing. Word processors, the darlings of the computer age, make this especially easy. You can polish your prose, and polish it again.
The trouble is, this can quickly damp out your train of thought. Many times, it's better to get the complete thought down, even if it isn't perfect. Otherwise, you will end up with pebbles instead of pearls; well-polished pebbles, but pebbles nonetheless.
According to Ronald Kellogg, "A funnel is an aid that channels the writer's attention into only one or two processes" (67). Eliminate the distractions, and you can keep your mind on what you're writing. Composition researchers have developed two computer-aided techniques, freewriting and invisible writing, to help eliminate distractions. While freewriting, the computer screen blinks if the writer pauses for more than a few seconds (Von Blum and Cohen 166). Invisible writing completely hides the text from the writer; you can try this yourself by simply turning down the brightness on your monitor (Marcus 120).
Speech recognition, then, seems especially well suited to the task of invisible writing. You can sit back and talk to the computer; even the bothersome task of typing words correctly is absent. By eliminating the temptation to backtrack and edit on the fly, speech recognition can smooth the flow of ideas onto paper.
The research cited, however, focuses on classroom composition. It's difficult to predict how well classroom results will carry over into nonacademic settings and longer documents. Technical writers, in particular, make heavier use of lists and figure references than composition students. Until researchers can analyze writers outside of classrooms, discussion of this sort can only be informed speculation.
The second benefit of speech recognition centers around information. Technical writers must accumulate notes, interviews, and other information sources before they can truly begin writing. Anything that can speed this process will make life that much easier. Instead of writing or talking to a tape recorder, the technical writer can dictate to a speech recognition program or take the program to a design meeting or interview.
Having a printed copy can benefit writers in several ways. First, unlike a tape recording, all parts of the material are easily accessible. Second, the information source can review the material for accuracy. Finally, the writer can easily move the text into a document.
Since the basic premise of writing is to put words on paper, it makes sense to get them on paper quickly. As an aid to this process, speech recognition is definitely an aid to writing.
Speech and writing are fundamentally different; people seldom speak as they write, or write as they speak (unless they are preparing a written copy of a speech). One central distinction is the irreversibility of speech. Roland Barthes says, "A word cannot be retracted, except precisely by saying that one retracts it. . . . [I]t is ephemeral speech which is indelible, not monumental writing" (qtd. in Sommers 379). Writing allows revision and polish.
Dictation, however, is not speech. Rather, it is speech adapted to writing. Just as people generally speak and write differently, they speak and dictate differently. The flexibility of spoken language allows this with no penalty.
Or is there a penalty? The sociologist Marshall McLuhan once stated, "The medium is the message." In this view, the means by which a message travels must have an effect on the message itself. You can precisely describe the notes in a sonata, but the effect is very different from hearing it played. If McLuhan's probe is valid, then dictation cannot help but be different from writing.
McLuhan's ideas, unfortunately, do not lend themselves well to empirical testing. Researchers have, however, performed tests on the differences between dictation and composition. John Gould, John Conti, and Todd Hovanyecz examined people's performance on a simulated "listening typewriter." They found that, in terms of effectiveness and time spent, subjects performed at least as well dictating to the listening typewriter as they performed writing (305). This, then, is an argument against McLuhan's view.
Even if we assume that dictation must carry "speech-like" aspects, is this such a bad thing? Technical writers, in particular, may find some benefit in this. David Porush speculated on how technical writers can use mnemonic devices from oral tradition to enhance technical documents, especially task sequences (141). Much has been said about the benefits of conversational style in technical writing; dictation may make this style easier to attain.
Dieterich, Rob. "I'll Talk to You Soon." Byte March 1996: 30.
Gould, John, John Conti, and Todd Hovanyecz. "Composing Letters with a Simulated Listening Typewriter." Communications of the ACM 26 (1983): 295-308.
Kellogg, Ronald T. "Idea Processors: Computer Aids for Planning and Composing Text." Computer Writing Environments: Theory, Research, and Design. Ed. Bruce Britton and Shawn Glynn. Hillsdale, New Jersey: Lawrence Erlbaum Associates, 1989. 57-92.
Marcus, Stephen. "Real-Time Gadgets with Feedback: Special Effects in Computer-Assisted Writing." The Computer in Composition Instruction: A Writer's Tool. Ed. William Wresch. Urbana, Illinois: National Council of Teachers of English, 1984. 120-130.
Nance, Barry. "You Talk, Warp Listens." Byte September 1996: 48.
Pepper, Jon. "New Programs Take Better Dictation." Byte September 1996: 38.
Porush, David. "What Homer Can Teach Technical Writers: The Mnemonic Value of Poetic Devices." Journal of Technical Writing and Communication 17 (1987): 129-143.
Ray, Eric. "TECHWR-L: A History and Case Study of a Profession-specific LISTSERV List." Technical Communication 43 (1996): 334-338.
Sommers, Nancy. "Revision Strategies of Student Writers and Experienced Adult Writers." College Composition and Communication 31 (1980): 378-388.
Von Blum, Ruth and Michael Cohen. "WANDAH: Writing-Aid AND Author's Helper." The Computer in Composition Instruction: A Writer's Tool. Ed. William Wresch. Urbana, Illinois: National Council of Teachers of English, 1984. 154-173.