Dictation Into the SpeakPad
Applet
IBM's speech enabled Wordpad clone called "SpeakPad"
lacks a multilevel undo facility. This is a problem when several
misrecognitions occur in a row and the corrections to those errors
are themselves misunderstood. When that happens, you're left with
no way to backtrack through the levels that failed in order to
get to a stable place from which you can resume.
Poor Recognizer Status Information
Although the VoiceCenter bar does show the last command, it only
shows it for a few seconds. If some misrecognition error has occurred,
but you don't notice it right away, there is no way to understand
how you might correct your actions the next time since the command
that failed is gone from your observation. There no history list
for recently processed commands. That lack makes it difficult
to debug what actually went wrong with consecutive misrecognitions
so that you could take corrective action such as word training
or finger training.
The processing state of the speech recognizer is only minimally
shown - is it thinking about a command or text, swapping itself
in and so on. So it's sometimes hard to tell in real time what
the recognizer was really thinking about when you are puzzled
by a single misrecognition. Some error messages are too long to
fit on the status bar so you are not always exactly certain of
the problem the program detected.
The combined effect of limited status information, lack of history
list of recently processed commands or text and lack of multi-level
undo makes debugging the cause and cure of multiple consecutive
misrecognitions require the wit and wisdom of Mr. Holmes.
Capitalization Sometimes Misfires
Capitalization gets confused sometimes. For example, the first
word of a quoted string "Like this" gets capitalized in every
context. As another example, if you manually position the cursor
at a white space between sentences where you would like to begin
a new sentence, you may or may not get a capitalization depending
on whether or not you just dictated a sentence that ended with
a period somewhere else. If you manually type the closing period
instead of speaking it, the next sentence will not begin with
the proper capital.
Some phrases seemed to have built in, but curious, capitalization.
For example, the phrase "from the Main menu" always got an extra
capital M, the phrase "the minimum Specification" always capitalized
the S, the phrase "Correctional Facility" gets capitalized as
shown. If you highlight the first word of a correctly capitalized
sentence with the intent to dictate a new word or phrase over
the top of it, the first word of the new phrase is not properly
capitalized.
Other Punctuation Problems
Last year's ViaVoice 98 had the unnerving habit of always inserting
a space before the dictated period at the end of every sentence
and then removing this space as you went on to the next sentence.
This has been partially fixed in ViaVoice Millennium but there
is some backsliding into old habits when inserting sentences into
the middle of existing text. For inserted text this space is added
for every new sentence and removed just like the old days.
Besides being an "eyebrow
raiser," this extra space is a real problem if you manually
reposition your cursor to some other place to make a correction
before you dictate another whole new sentence. In that instance,
the extra space is left there in the previous location and you
need to remember to go back and fix it up later. This extra space
also occurs with other punctuation like "," or "?" and so on.
Saying the phrase "OK" if the cursor is in the middle of a word
causes the word to be split onto a new line. In other words, the
phrase "OK" is almost always translated as if the Enter key had
been depressed. While this is appropriate for, say, the correction
dialog box, it is a bizarre behavior in the middle of dictated
text.
Command Syntax Needs Improvement
ViaVoice has a computer attention control word, ("computer")
which when said at any point forces the computer to interpret
the next vocalized phrase as a command rather than text. This
is a nice feature. But from the human factors standpoint and the
non theoretical practical standpoint, there should be a companion
escape attention control word which forces the computer to NOT
interpret the next word especially. For example, it is impossible
to say the phrase "new paragraph" as in "You should always begin
a new paragraph with a capital letter" without getting the physical
white space demarcation for a new paragraph inserted into that
phrase.
If you try to outflank the recognizer by slowing things down
a bit, saying the word "new," and then waiting to say "paragraph,"
the computer interprets the word "new" as a command to begin a
new file. That interpretation brings up a file save confirmation
window that you are really very uninterested in, and things go
downhill pretty fast after that. So you have to really step back
and think about how one fools the recognizer by coming up with
something like "You should always begin a new"..."paragraph with
a"..."capital"..."letter." And even then you're going to run into
the quoted string capitalization problems mentioned earlier.
Although IBM has been trying to make their commands have a "natural"
syntax, the work hasn't been quite completed somehow. Perhaps
the most disappointing aspect of natural syntax is that it is
not fully implemented in the SpeakPad applet. This is a pity because
this applet has certain special conveniences which are simply
not available in the environments where the natural syntax implementation
is more complete. Further, not everyone will have the external
software packages where the implementation is more forgiving.
The incompleteness is especially apparent around the commands
which end in "this" or "that" such as "scratch that", "delete
this" and "undo this". Our testers found it nearly impossible
to always remember which commands used "this" and which used "that."
While precise English might specify the most correct usage for
"this" and "that," our testers are not academic types. The lack
of symmetry in commands was troubling, too. Based on the syntax
of other commands, it seemed "natural" to some users to say "read
this/that" to start the text to speech function at the cursor
position. But they discovered that the command that works is really
"begin reading."
Compounding the problem is that "scratch that" under some circumstances
does word deletion and in other circumstances does "undoing" of
command actions. Further, "scratch that" is multi level within
SpeakPad but not multilevel in other applications such as Netscape
Mail. The phrase "delete this" will delete a word only if it is
highlighted in its entirety, whereas "scratch that" will delete
the previous phrase if the cursor is at the end of the phrase.
Oddly, the phrase "delete this" is not even listed in the Quick
Reference Card.
What is IBM Thinking?
All these nuances are undoubtedly semantically and syntactically
pure to some IBM designer eyes somewhere. But they were an intellectual
wonderment to our testers and, in the end, the testers stopped
using them altogether, favoring instead the plain old manual correction
with a mouse and keyboard in those circumstances. IBM sales literature
states something along the lines of "user feedback showed us that
people frequently used the keyboard to make corrections, so ViaVoice
Millennium is designed to train itself by taking into account
keyboard corrections." Well, you have to applaud IBM for sincerely
responding to this level of feedback. But at the same time, based
on the above experience, we wondered if they ever completely investigated
the reason for the preference of keyboard correction.
And did they have the time to investigate the consequences of
their chosen solution? One of our testers, who eschewed the vocalizing
of "delete/undo/scratch this/that" possibilities and made manual
corrections instead, is neither a good typist nor a good speller.
Since ViaVoice is in the habit of searching for words that are
"new" to the speaker's vocabulary, when he closed down the test
document, ViaVoice obligingly presented him with a dialog box
containing a lengthy list of wrongly typed words like "congartulatoyr,
messgaes, prvusly...." and inquired whether he would like these
"new words" added to his user vocabulary. Of course, the words
in the list weren't new in that sense; they were the words he
had manually corrected rather than use a "scratch that - redictate"
sequence.
Obviously, adding these wrongly spelled words to the vocabulary
would corrupt the vocabulary rather than improve it. Instead,
the location of all these words would need to be found in the
document and then the words properly spelled. But other than laboriously
writing down the long list on a scrap piece of paper and then
manually finding each word, there is no voice driven method for
dealing with this need. Maddeningly, the ViaVoice "select "
command, a command otherwise very valuable for zeroing in on text
to be corrected, is paradoxically valueless in this situation
because one cannot reliably pronounce, for selection purposes,
a misspelled word. Our sympathy goes out to the IBM human factors
designer who will ultimately have to resolve this predicament
- it's a labyrinthine challenge.
What's This?
The SpeakPad title bar has a caption in it which represents the
title of one of the other applications that is open somewhere
else on the desktop. There is no apparent rhyme or reason to the
choice of caption. We suspect that the caption is the title of
the left most item on the windows task bar. There is no explanation
about how this might be valuable and its presence on the SpeakPad
title bar would be confusing to a naive user especially since
the title bar otherwise contains the name of the file currently
being dictated.
Voice Playback Useful But Inconsistent
ViaVoice has a function which plays back an audio recording of
your voice saying selected phrases you have just dictated. The
function is useful if you forget exactly what you said and you
want to correct some errors in earlier spoken text.
After you highlight a phrase, the playback function is accessible
two ways: through the correction window or from the Menu Bar.
From the Menu Bar the phrase plays just fine. But from the correction
window, a button which starts the playback is sometimes mysteriously
grayed out even though the phrase will play via the Menu Bar.
Also playback gets confused if you start engaging in insertion
and deletion of the text. It may fail to play the selected text
or it may halt in the middle of a selection under those circumstances.
Don't Use a Minimum Speed Machine
Although the slow machine has the minimum specification required
by IBM, the blinking cursor had the particularly aggravating habit
of occasionally being ahead of the actual processing point of
dictated speech. Thus, one of our testers, whose manual dexterity
has been finely honed by video games, would position the cursor
to a new place in the text, see that the position was physically
correct, begin dictating, and yet found that part of the dictated
text was where the cursor had been previously located with the
remainder of the dictated text being placed where the cursor had
been moved to.
Also on the slow machine, we found that occasionally a word or
two would be dropped in long phrases. They would also be missing
from the "playback" audio. So we would not recommend
a minimum speed configuration machine.
Untidy Shutdown
The application does not entirely tidy up for itself when it
closes down. According to the Windows task manager certain programs
are left lying about. While this apparently causes no functional
problems, every time programs in general do this, they eat into
the computers memory resources for no real gain.
More: Back to the Beginning
- Introduction
More: Ongoing Training
Process...
You are here>: Dictation
into the SpeakPad Applet...
More: Text to Speech Performance...
More: Integration With
Other Applications...