- Feasible Impossibilities - https://www.impossibilities.com/v4 -

Speech Recognition Grammar Specification Advances to Candidate Recommendation

The W3C [1] issued a press release [2] today announcing that their current proposed specification for Speech Recognition Grammar [3] has reached Candidate Recommendation status. What does that mean for developers? It means that along with the spec announced today and things like Voice XML [4], we are getting closer and closer to having a defined standard for developers to allow communication, and interoperability between disparate systems that provide text-to-speech synthesis, voice control, and web based telephony applications. Obviously this is of interest to me because of the work [5] I have been doing. Interested in some other ways to tackle this along with Flash? Read on…There are a couple of other interesting things going on. Microsoft recently made available their Microsoft .NET Speech SDK 1.0 Beta [6] available (which might have some very interesting uses with the recently announced beta of Macromedias Flash Remoting for .NET [7]). Microsoft uses their own Speech API [8] and their own proposed industry specification called SALT (Speech Application Language Tags) [9] This is similar to the system I have implemented. I however, don’t use SALT or Voice XML because they are just too verbose for me and for many applications it is overkill for the developer, which is what my system tries to avoid. It is geared toward the developer who doesn’t want to learn a new standard, you just want to get it to work, and a quick setup of the server to power it, and flexibility in implementation.

In doing my research of other systems out their that deliver text-to-speech via the browser, most if not all require either specialized plugins (typically large downloads) and are usually limited to Internet Explorer, or are even further limited by specialized software or underlying operating system requirements on top of specialized plugins and markup language. This is where my system has the advantage, The only other solution that has interested me besides my own version is IBM’s WebSphere voice server. [10] this is interesting for many reasons. On April 29th, Macromedia announced that ColdFusion MX can be deployed on IBM’s WebSphere Platform [11]. The marriage of the two could yield some pretty cool applications and services. The deal breaker for many of you will come when you find out the cost of IBMs Voice Server: How does $15,000 per processor sound to you? Yikes! they are targeting this at the VOIP and Telephony markets who have the deep pockets, not the average web developer or company looking for ways to comply with the ADA (Americans with Disabilities Act). To be fair, IBM’s version has support for many languages and does have a free SDK [12] but you will still need the pricey server. They also have a pretty good demo online [13]. Their demo relies on a JAVA applet that loads into your browser and connects back to their Voice Server. Their response time is a bit better than mine, but if my solution was running on the unlimited bandwidth and power of an arsenal of IBM heavy iron [14], and at the low sampling rate of only 8Khz, I guarantee it would be just as fast. 🙂

Even with all their eggheads with doctorate degrees, patents galore, and celebrated genius scientists, it looks like IBM still managed to overlook a wide open directory on their TTS demo server. OOPS! Hope they don’t mind that anyone on earth can see what folks have been typing into their demo. [15] From a look at that log file, folks just can’t seem to get enough synthesized dirty talk in their lives. Sorry, couldn’t resist that. 🙂

I do like IBM’s approach to using JAVA on the client end. It would be interesting to see a comparison chart showing all the various devices and machines that can support the required JAVA applet as compared to all the devices and browsers that can support my Flash based solution. Thats something I might spend the time to create in the near future. To sum up I think there is room for all three systems (and others). I believe the system I have implemented can fill a specific need or niche in between those, let me know if you think it could fill your need. In the next week or so I am going to put up a full blown interactive demo of the features that my system incorporates and supports. Stay tuned…more to come. – Rob