|resources:||Home Journal To Do Dev Guide Source Code Test Cases Bugs Mailing List|
VXML applications, basically, are finite-state machines where at each state a prompt is played to the user and a grammar is activated in a speech recognition resource. Which arc is followed out of the state depends on which item in the grammar is recognized, or whether the utterance couldn't be matched, or whether the user stays silent until a timeout is reached. A vxml browser constructs the state graph from the vxml input files, and navigates the graph based on the activity of the speech recognition resource. This alternation of prompting the user, attempting to recognize the user's response, prompting with a followup, recognizing any response to that, and so on, is intended to mimic the turn-taking structure of ordinary human conversations. Being able to create applications merely by writing configuration files is much easier (and usually more maintainable) than writing the same app in a programming language that "glues together" the recognizer and prompter with the right app logic.
A typical, simple vxml app is to prompt the user for a city and state, and give a weather forecast in response (by using the city and state values identified in the grammar by the recognizer to query some outside data source). A more complex typical app is to allow the user to access an e-mail inbox by stepping through the headers, and perhaps reading an e-mail body or deleting a message.
- MacOS 10.1's implementation of J2SE 1.3.1.
- Apple's Speech Framework (via its Java API) to recognize speech and to generate prompts through a computerized voice ("speech synthesis"). However, I welcome submissions from those who would like to write wrappers for other speech rec/synth solutions.
btw, I use a USB mic attached to a Mac G4 for input, and Harmon Kardon "sound sticks" for output.
- JDOM's document-handling API, used in conjunction with the Xerces XML parser.