The market for speech SDK's has become quite rich. IBM has a
developer tool kit, which is also quite elaborate and allows much
more precise control of the voice environment than is obvious
in their dictation product and has Linux offerings. Chant Inc.
makes a Smalltalk toolkit which fits around the IBM C oriented
API thus giving the Smalltalk community access to speech technology.
Philips is also offering SDKs. ScanSoft is offering recognition
and text-to-speech SDKs for handheld devices.
All of these developer toolkits require serious design work
concerning the details of the interaction between the user, application
and speech engine. Allow time for experimentation with these kits
before committing to your first project estimate. The details
of interpreting speech and responding correctly to the various
utterance sequences that normal people use can be daunting.
Server side applications make use of SDK's as well. As an example,
Nuance offers Foundation SpeechObjects(TM), a core set of free,
open-source, standards-based components for building speech applications
- components that handle off hook, on hook, DTMF recognition,
message playing etc. It partially overlaps the UNISYS facilities
for dialogue control and extraction of quantities like dollar
amount and dates from the conversation. That SDK does not provide
quite as elaborate interpretation of prior context the full UNISYS
The UNISYS product, with its elaborate semantic dictionary,
can do things like equate "I don't want to pay anything" with
"Zero dollars" without the programming assistance that would otherwise
be needed using the Nuance SDK alone. Nuance provides pre-written
grammars for brokerage, airline, dollar amounts and dates.
Back to top: client side...
Back to top: the server side...
Back to top: developer SDK...