VoiceWizard - ViaVoice Dictation

VoiceWizard: the speech resource
for executives and other adventurers exploring voice technology miracles

IBM ViaVoice Millennium cont'd:

Dictation Into the SpeakPad Applet

IBM's speech enabled Wordpad clone called "SpeakPad" lacks a multilevel undo facility. This is a problem when several misrecognitions occur in a row and the corrections to those errors are themselves misunderstood. When that happens, you're left with no way to backtrack through the levels that failed in order to get to a stable place from which you can resume.

Poor Recognizer Status Information

Although the VoiceCenter bar does show the last command, it only shows it for a few seconds. If some misrecognition error has occurred, but you don't notice it right away, there is no way to understand how you might correct your actions the next time since the command that failed is gone from your observation. There no history list for recently processed commands. That lack makes it difficult to debug what actually went wrong with consecutive misrecognitions so that you could take corrective action such as word training or finger training.

The processing state of the speech recognizer is only minimally shown - is it thinking about a command or text, swapping itself in and so on. So it's sometimes hard to tell in real time what the recognizer was really thinking about when you are puzzled by a single misrecognition. Some error messages are too long to fit on the status bar so you are not always exactly certain of the problem the program detected.

The combined effect of limited status information, lack of history list of recently processed commands or text and lack of multi-level undo makes debugging the cause and cure of multiple consecutive misrecognitions require the wit and wisdom of Mr. Holmes.

Capitalization Sometimes Misfires

Capitalization gets confused sometimes. For example, the first word of a quoted string "Like this" gets capitalized in every context. As another example, if you manually position the cursor at a white space between sentences where you would like to begin a new sentence, you may or may not get a capitalization depending on whether or not you just dictated a sentence that ended with a period somewhere else. If you manually type the closing period instead of speaking it, the next sentence will not begin with the proper capital.

Some phrases seemed to have built in, but curious, capitalization. For example, the phrase "from the Main menu" always got an extra capital M, the phrase "the minimum Specification" always capitalized the S, the phrase "Correctional Facility" gets capitalized as shown. If you highlight the first word of a correctly capitalized sentence with the intent to dictate a new word or phrase over the top of it, the first word of the new phrase is not properly capitalized.

Other Punctuation Problems

Last year's ViaVoice 98 had the unnerving habit of always inserting a space before the dictated period at the end of every sentence and then removing this space as you went on to the next sentence. This has been partially fixed in ViaVoice Millennium but there is some backsliding into old habits when inserting sentences into the middle of existing text. For inserted text this space is added for every new sentence and removed just like the old days.

Besides being an "eyebrow raiser," this extra space is a real problem if you manually reposition your cursor to some other place to make a correction before you dictate another whole new sentence. In that instance, the extra space is left there in the previous location and you need to remember to go back and fix it up later. This extra space also occurs with other punctuation like "," or "?" and so on.

Saying the phrase "OK" if the cursor is in the middle of a word causes the word to be split onto a new line. In other words, the phrase "OK" is almost always translated as if the Enter key had been depressed. While this is appropriate for, say, the correction dialog box, it is a bizarre behavior in the middle of dictated text.

Command Syntax Needs Improvement

ViaVoice has a computer attention control word, ("computer") which when said at any point forces the computer to interpret the next vocalized phrase as a command rather than text. This is a nice feature. But from the human factors standpoint and the non theoretical practical standpoint, there should be a companion escape attention control word which forces the computer to NOT interpret the next word especially. For example, it is impossible to say the phrase "new paragraph" as in "You should always begin a new paragraph with a capital letter" without getting the physical white space demarcation for a new paragraph inserted into that phrase.

If you try to outflank the recognizer by slowing things down a bit, saying the word "new," and then waiting to say "paragraph," the computer interprets the word "new" as a command to begin a new file. That interpretation brings up a file save confirmation window that you are really very uninterested in, and things go downhill pretty fast after that. So you have to really step back and think about how one fools the recognizer by coming up with something like "You should always begin a new"..."paragraph with a"..."capital"..."letter." And even then you're going to run into the quoted string capitalization problems mentioned earlier.

Although IBM has been trying to make their commands have a "natural" syntax, the work hasn't been quite completed somehow. Perhaps the most disappointing aspect of natural syntax is that it is not fully implemented in the SpeakPad applet. This is a pity because this applet has certain special conveniences which are simply not available in the environments where the natural syntax implementation is more complete. Further, not everyone will have the external software packages where the implementation is more forgiving.

The incompleteness is especially apparent around the commands which end in "this" or "that" such as "scratch that", "delete this" and "undo this". Our testers found it nearly impossible to always remember which commands used "this" and which used "that." While precise English might specify the most correct usage for "this" and "that," our testers are not academic types. The lack of symmetry in commands was troubling, too. Based on the syntax of other commands, it seemed "natural" to some users to say "read this/that" to start the text to speech function at the cursor position. But they discovered that the command that works is really "begin reading."

Compounding the problem is that "scratch that" under some circumstances does word deletion and in other circumstances does "undoing" of command actions. Further, "scratch that" is multi level within SpeakPad but not multilevel in other applications such as Netscape Mail. The phrase "delete this" will delete a word only if it is highlighted in its entirety, whereas "scratch that" will delete the previous phrase if the cursor is at the end of the phrase. Oddly, the phrase "delete this" is not even listed in the Quick Reference Card.

What is IBM Thinking?

All these nuances are undoubtedly semantically and syntactically pure to some IBM designer eyes somewhere. But they were an intellectual wonderment to our testers and, in the end, the testers stopped using them altogether, favoring instead the plain old manual correction with a mouse and keyboard in those circumstances. IBM sales literature states something along the lines of "user feedback showed us that people frequently used the keyboard to make corrections, so ViaVoice Millennium is designed to train itself by taking into account keyboard corrections." Well, you have to applaud IBM for sincerely responding to this level of feedback. But at the same time, based on the above experience, we wondered if they ever completely investigated the reason for the preference of keyboard correction.

And did they have the time to investigate the consequences of their chosen solution? One of our testers, who eschewed the vocalizing of "delete/undo/scratch this/that" possibilities and made manual corrections instead, is neither a good typist nor a good speller. Since ViaVoice is in the habit of searching for words that are "new" to the speaker's vocabulary, when he closed down the test document, ViaVoice obligingly presented him with a dialog box containing a lengthy list of wrongly typed words like "congartulatoyr, messgaes, prvusly...." and inquired whether he would like these "new words" added to his user vocabulary. Of course, the words in the list weren't new in that sense; they were the words he had manually corrected rather than use a "scratch that - redictate" sequence.

Obviously, adding these wrongly spelled words to the vocabulary would corrupt the vocabulary rather than improve it. Instead, the location of all these words would need to be found in the document and then the words properly spelled. But other than laboriously writing down the long list on a scrap piece of paper and then manually finding each word, there is no voice driven method for dealing with this need. Maddeningly, the ViaVoice "select " command, a command otherwise very valuable for zeroing in on text to be corrected, is paradoxically valueless in this situation because one cannot reliably pronounce, for selection purposes, a misspelled word. Our sympathy goes out to the IBM human factors designer who will ultimately have to resolve this predicament - it's a labyrinthine challenge.

What's This?

The SpeakPad title bar has a caption in it which represents the title of one of the other applications that is open somewhere else on the desktop. There is no apparent rhyme or reason to the choice of caption. We suspect that the caption is the title of the left most item on the windows task bar. There is no explanation about how this might be valuable and its presence on the SpeakPad title bar would be confusing to a naive user especially since the title bar otherwise contains the name of the file currently being dictated.

Voice Playback Useful But Inconsistent

ViaVoice has a function which plays back an audio recording of your voice saying selected phrases you have just dictated. The function is useful if you forget exactly what you said and you want to correct some errors in earlier spoken text.

After you highlight a phrase, the playback function is accessible two ways: through the correction window or from the Menu Bar. From the Menu Bar the phrase plays just fine. But from the correction window, a button which starts the playback is sometimes mysteriously grayed out even though the phrase will play via the Menu Bar. Also playback gets confused if you start engaging in insertion and deletion of the text. It may fail to play the selected text or it may halt in the middle of a selection under those circumstances.

Don't Use a Minimum Speed Machine

Although the slow machine has the minimum specification required by IBM, the blinking cursor had the particularly aggravating habit of occasionally being ahead of the actual processing point of dictated speech. Thus, one of our testers, whose manual dexterity has been finely honed by video games, would position the cursor to a new place in the text, see that the position was physically correct, begin dictating, and yet found that part of the dictated text was where the cursor had been previously located with the remainder of the dictated text being placed where the cursor had been moved to.

Also on the slow machine, we found that occasionally a word or two would be dropped in long phrases. They would also be missing from the "playback" audio. So we would not recommend a minimum speed configuration machine.

Untidy Shutdown

The application does not entirely tidy up for itself when it closes down. According to the Windows task manager certain programs are left lying about. While this apparently causes no functional problems, every time programs in general do this, they eat into the computers memory resources for no real gain.

More: Back to the Beginning - Introduction
More: Ongoing Training Process...
You are here>: Dictation into the SpeakPad Applet...
More: Text to Speech Performance...
More: Integration With Other Applications...

Choose from menu...

You are here:: home->reviews->desktop ->ibmvv -> dictation
Technology Review
	What's Out There
		Executive View
		Developer View
	News
		Hot Products
	FAQ

At This Site
	Who We Are
	Projects
		Research
		Wish List
	Product Reviews
	Bugs
		Report New Bug
		Bugs on File

Consulting Services
Home
Other Stuff
	Download


	Case Studies
	How We Test
		Bugs
		Usability
		Expandability
	For Vendors Only

Page Last Updated: 12/19/99