First, let me explain some of the background for why I developed the application. I used to work on the development of a computer telephony system. We were strong believers in the theory that it was important for the developers to get a good understanding of the end users' perspective of the system and so we encouraged all of the development team to use early builds of the system as much as possible.
|Some of the Headphones I Use.|
Because of all of these potential problems I often spend the first few minutes of a telephone meeting shouting "Hello! Hello! can you hear me?". If I was speaking to another team member I could expect them to be understanding of this wasted time and/or poor audio quality while I tested several headsets to find which was working best. However, when I was making an important call to someone I wanted to impress, I needed some way to be totally confident that all aspects of my telephony setup were working correctly.
Anyone who uses Skype is probably familiar with the "echo123" virtual user. This is a virtual Skype account that anyone can call and be answered by a pleasant sounding lady who will listen to what you say and then repeat it back to you as it sounds to her. I decided to hack together something similar that could be used with any telephony system. After a bit of searching on the internet I found the voxeo developers site which offeres excellent free resources to anyone wanting to develop voice based applications. Voxeo make their money from providing commercial grade voice response systems to mission critical systems, but in order to convince people how easy it is to develop a user friendly voice interface to their system they give developers free access to their powerful web based development environment and they will even host your application on their test servers so that you can test it out in action.
Voxeo support a number of programming languages including the industry standard VoiceXML. Developing a VoiceXML server is very complex, but the good news is that since Voxeo have done that you don't have to. Developing a voiceXML application is very easy (there are excellent tutorials on the Voxeo site to get your started). I was able to develop my application in under 30 lines of easy to write/understand XML. You can get the full source code here.
The way VoiceXML works is that you specify prompts for the system to play and then you listen for the user to say something (or type a DTMF tone on their keypad). You specify in XML what should be done with the response. You can see I have only one
statement and I use the text to speech function to generate the prompt (it is also possible to record the prompts for a more natural sounding interface).
The only complex line in my code is the one that reads
record name="R_1" beep="true" dtmfterm="true" maxtime="10s" finalsilence="1s" silence="3sTranslated into English this tag means:
- Record what you hear in a file named R_1.wav
- If you hear a DTMF tone, stop recording
- Listen for a maximum of 10 seconds
- If you hear nothing give up after 3 seconds
- If you hear something then terminate when the speaker leaves a gap of 1 second or more
Obviously real world applications could get more complex and if you try to recognize what the user is saying it can get things hilariously wrong when the caller is not a native speaker. But the general idea is not too hard to master. In any case we only need to distinguish between when we hear something so that the "filled" tag applies, or when we hear nothing and the "noinput" tag applies.
If you want to try out the application you can call +1(617)963-0648 to get the version with this source code, or if you prefer the sound of my voice you can call +1(617)500-5332 to hear a slightly modified version where the prompts use a recording of my voice.