VoiceWizard - ViaVoice Text to Speech

VoiceWizard: the speech resource
for executives and other adventurers exploring voice technology miracles

IBM ViaVoice Millennium cont'd:

Text To Speech Performance

ViaVoice has a winsome animated agent which will read text starting from the current cursor position. The agent does surprisingly well with abbreviations, company names and so on. Especially impressive was its sense of phrasing and its sense of exactly where a normal person might actually take a pause or breath.

However, refinement to this feature is still needed. For example, there is a limitation on the amount of text-to-speech that may be read. If the document in question is longer than that amount, there is no way to go beyond that limitation other than manually splitting the remainder off into a new document.

ViaVoice provides some additional specialized vocabulary topics which may be switched in and out of any dictation session. The "chatter's jargon" topic will take certain phrases such as "rolling on the floor laughing" spell it as "rofl" in keeping with chat room culture. But the text-to-speech processing does not have symmetrical understanding of these particular abbreviations and pronounces the characters phonetically instead.

As part of the process of getting started, the text to speech agent selects that text from the current cursor position through the end of the document. Within SpeakPad, after the text is selected, the text is positioned on the screen so that the beginning position of the cursor is in the upper left most corner of the editing window, a quite reasonable place for it to be. But within something like Netscape Mail, the text is left positioned at the last word of the message. This asymmetry was confusing and sometimes troublesome for our users.

Some of our testers used the text to speech agent as a way to audio proofread documents that had been stared at for some time. The agent's voice was useful in detecting errors that the eyeball just didn't see any more. But there was an odd inconsistency in behavior when the tester paused the agent in order to make a correction. The first click, after the pause, needed to move the blinking cursor to the physical correction location wouldn't take. One has to click twice. If you don't, the correction is made at the place where reading started which is typically not the place where the correction actually is needed.

Our testers found this extra muscle event was often hard to remember to do with the result that the correction had to be redone in some way. One also had to close down the agent and restart the whole readback sequence from the main menu to get the agent to begin reading at the newly corrected words rather than simply clicking the play button on the agent. Often that was cumbersome.

Every once in awhile in testing these products, there is a little comic relief amidst the seemingly endless detail. The text-to-speech agent, as we mentioned earlier, generally had fine performance regarding phrasing. But long bulleted lists or the long list of dashes in the separator marks of e-mail from the popular "Hotmail.com" caused the agent to gradually lower the pitch of its reading voice. We stood around taking bets - "How low would it go?" - we asked each other. Well, it's probably heartless to take bets like this on robots - they are defenseless after all.

More: Back to the Beginning - Introduction
More: Ongoing Training Process...
More: Dictation into the SpeakPad Applet...
You are here>: Text to Speech Performance...
More: Integration With Other Applications...

Choose from menu...

You are here:: home->reviews->desktop ->ibmvv -> text_to_speech
Technology Review
	What's Out There
		Executive View
		Developer View
	News
		Hot Products
	FAQ

At This Site
	Who We Are
	Projects
		Research
		Wish List
	Product Reviews
	Bugs
		Report New Bug
		Bugs on File

Consulting Services
Home
Other Stuff
	Download


	Case Studies
	How We Test
		Bugs
		Usability
		Expandability
	For Vendors Only

Page Last Updated: 12/05/99