Writing - Logged ON


Just Say the Word

he ability for us to speak directly to a machine and have it understand us has been a common theme in Science Fiction for decades. It is ultimately more natural than using a computer mouse or the keyboard, which is the way we traditionally communicate our ideas to a computer. So why is it that we are just now starting to use voice recognition technology on a regular basis?

Having a machine both recognize our voice and be able to speak has actually been worked on since Alexander Graham Bell first invented the telephone in the 1870’s. Speech ‘synthesis’, or the ability to convert text (such as this column) into sounds, has evolved nicely and we have all heard the robotic sounding computer voices in many movies and on television. Computer generated speech sounds much better now and you can easily experiment with it on your own computer. AT&T Labs has a demo of their technology at this site. This is the first time that I have seen actual accents attempted (such as UK and German). The direct benefits to those with visual impairments are readily apparent.

Converting human speech back into text form is much more challenging. Since the 1970’s, great leaps have been made in this field corresponding directly to the increase in computing power. It takes a lot of calculations for a machine to turn a stream of audio back into words. In fact just a few short years ago it would take all of the processing power in your computer to attempt to analyze even the simplest of phrases.

Using voice recognition on your computer has gotten better and easier. Dictation software such as ‘Dragon Naturally Speaking’ can achieve results of 99% accuracy and no longer require the extensive training they used to. In order to recognize any individual person, the software would have you say key phrases to establish patterns. Often requiring hours to accomplish, this painful step has been all but eliminated from the most modern speech recognition systems. They even have versions that are tailored to specific industries (such as medical) that include words we don’t use everyday.

The most often overlooked aspect of trying this on your own is getting a good microphone. One that has its own power source (such as batteries or a USB microphone) generally gives you the best results. Once you have that, you can set up software on your operating system that allows you to control basic functions of your computer. If you are using a Mac, the software and microphone are most likely built in and you can start by saying the easiest command, ‘What time is it?’ and experience both speech recognition and speech synthesis when you hear the answer. To set up your OSX enabled Mac, visit http://www.apple.com/macosx/features/speech/. If you are using Windows XP, Microsoft’s Speech Recognition Engine is not included unless you own Office or XP Plus!. You can find out more information here. For those using Linux, there is an open source package called ‘Open Mind Speech’ that can be found here. You have probably used voice recognition software and not even realized it. Telephone Directory Assistance has been using it for years now. A computer asks you what city, listing, and whether the number is business or residential. Voice recognition systems analyze what you say and, where possible, give you the results without human intervention at all. That tiny new cell phone you just picked up probably has the ability to recognize your voice as well, making dialing while driving much safer (just say ‘Call Home’). I was just getting used to the fact that the old rotary style telephone dialerdialers are a thing of the past, and now it looks like it won’t be long before the numeric keypad is gone too. Even my coolest Star Wars toy, R2-D2, can understand me when I ask him to dance.

As people, we start listening and absorbing speech from the time we are born. I played with the idea of dictating this column into my computer to prove how far we’ve come in this field. Although I know my computer can recognize all of my words, I’m just not sure that it’s ready to hear what I have to say.


Article Copyright ©2005 by Syd Bolton. Original publication date: 3/12/2005.
Reproduction requires permission, please e-mail for more information.

 
| Share |