Text To Speech Performance
ViaVoice has a winsome animated agent which will read text starting
from the current cursor position. The agent does surprisingly
well with abbreviations, company names and so on. Especially impressive
was its sense of phrasing and its sense of exactly where a normal
person might actually take a pause or breath.
However, refinement to this feature is still needed. For example,
there is a limitation on the amount of text-to-speech that may
be read. If the document in question is longer than that amount,
there is no way to go beyond that limitation other than manually
splitting the remainder off into a new document.
ViaVoice provides some additional specialized vocabulary topics
which may be switched in and out of any dictation session. The
"chatter's jargon" topic will take certain phrases such as "rolling
on the floor laughing" spell it as "rofl" in keeping with chat
room culture. But the text-to-speech processing does not have
symmetrical understanding of these particular abbreviations and
pronounces the characters phonetically instead.
As part of the process of getting started, the text to speech
agent selects that text from the current cursor position through
the end of the document. Within SpeakPad, after the text is selected,
the text is positioned on the screen so that the beginning position
of the cursor is in the upper left most corner of the editing
window, a quite reasonable place for it to be. But within something
like Netscape Mail, the text is left positioned at the last word
of the message. This asymmetry was confusing and sometimes troublesome
for our users.
Some of our testers used the text to speech agent as a way to
audio proofread documents that had been stared at for some time.
The agent's voice was useful in detecting errors that the eyeball
just didn't see any more. But there was an odd inconsistency in
behavior when the tester paused the agent in order to make a correction.
The first click, after the pause, needed to move the blinking
cursor to the physical correction location wouldn't take. One
has to click twice. If you don't, the correction is made at the
place where reading started which is typically not the place where
the correction actually is needed.
Our testers found this extra muscle event was often hard to
remember to do with the result that the correction had to be redone
in some way. One also had to close down the agent and restart
the whole readback sequence from the main menu to get the agent
to begin reading at the newly corrected words rather than simply
clicking the play button on the agent. Often that was cumbersome.
Every once in awhile in testing these products, there is a little
comic relief amidst the seemingly endless detail. The text-to-speech
agent, as we mentioned earlier, generally had fine performance
regarding phrasing. But long bulleted lists or the long list of
dashes in the separator marks of e-mail from the popular "Hotmail.com"
caused the agent to gradually lower the pitch of its reading voice.
We stood around taking bets - "How low would it go?" - we asked
each other. Well, it's probably heartless to take bets like this
on robots - they are defenseless after all.
More: Back to the Beginning
- Introduction
More: Ongoing Training
Process...
More: Dictation into
the SpeakPad Applet...
You are here>: Text
to Speech Performance...
More: Integration With
Other Applications...