vBrowse 
resources: | Home Journal To Do Dev Guide Source Code Test Cases Bugs Mailing List |
---|
WHAT
vBrowse is a pure-Java vxml browser, currently implementing about 40% of the vxml 1.0 spec. A final spec for vxml 2.0 has just been released, and I plan to support it in the future.
VXML applications, basically, are finite-state machines where at each state a prompt is played to the user and a grammar is activated in a speech recognition resource. Which arc is followed out of the state depends on which item in the grammar is recognized, or whether the utterance couldn't be matched, or whether the user stays silent until a timeout is reached. A vxml browser constructs the state graph from the vxml input files, and navigates the graph based on the activity of the speech recognition resource. This alternation of prompting the user, attempting to recognize the user's response, prompting with a followup, recognizing any response to that, and so on, is intended to mimic the turn-taking structure of ordinary human conversations. Being able to create applications merely by writing configuration files is much easier (and usually more maintainable) than writing the same app in a programming language that "glues together" the recognizer and prompter with the right app logic.
A typical, simple vxml app is to prompt the user for a city and state, and give a weather forecast in response (by using the city and state values identified in the grammar by the recognizer to query some outside data source). A more complex typical app is to allow the user to access an e-mail inbox by stepping through the headers, and perhaps reading an e-mail body or deleting a message.
WHY
I wanted a voice portal that presented the info I wanted, the way I wanted it. It also seemed like a great way to learn Java, XML, DOM, VXML, JavaScript, and (still to come) JSP.
HOW
vBrowse uses:
- MacOS 10.1's implementation of J2SE 1.3.1.
- Apple's Speech Framework (via its Java API) to recognize speech and to generate prompts through a computerized voice ("speech synthesis"). However, I welcome submissions from those who would like to write wrappers for other speech rec/synth solutions.
btw, I use a USB mic attached to a Mac G4 for input, and Harmon Kardon "sound sticks" for output.
- JDOM's document-handling API, used in conjunction with the Xerces XML parser.
- The Rhino JavaScript interpreter