boston.com Business your connection to The Boston Globe

Computer programs that react to speech gain real-world use

The man on the other end of the phone was angry -- very angry. A senior citizen, he had just received a bill for medical insurance that included a sizable increase in his monthly premium. He dialed up WPS Health Insurance and gave one of the telephone operators a piece of his mind.

But the call center employee was not the only one listening. The outraged customer was also speaking to a powerful computer that analyzed his words and his tone of voice. A vocabulary database identified each insult, while an emotion detector sampled his rage. A couple of minutes into the call, the computer came to a decision. If WPS wanted to hang onto this customer, he had better be connected to a supervisor -- and fast.

A supervisor dialed into the call, and a customer was saved, with help from speech-recognition software made by NICE Systems Inc. of Ranana, Israel. ''It's magic software," said Sharon Whitwam, vice president of member services at WPS in Madison, Wis. ''It's really quite remarkable."

The NICE system is just one example of the boom in speech-recognition technology -- programs that enable computers to react to human speech. Other new services let customers sift through dialogue in search of the right podcast and buy tickets over the phone without punching in an endless series of numbers.

According to the research firm Datamonitor PLC, businesses worldwide spent just over $1 billion on speech-recognition systems last year, and Datamonitor expects that outlay to reach $2.4 billion by 2009. Speech recognition has been a staple of science-fiction films since at least the movie ''2001: A Space Odyssey." But the growth in the real-world application has little in common with the way the astronauts in Stanley Kubrick's classic interacted with HAL the computer.

Instead, businesses use speech recognition to enhance telephone-based customer service. New speech-based Internet search services let people look up online audio files with Google-like ease. And government investigators use the technology to identify terrorist plotters chatting over international phone lines.

For years, companies such as Nuance Communications Inc. of Burlington and IBM Corp. have advertised programs to let users control their desktop machines with their voices rather than with keyboards and mice. These programs have acquired a small following among poor typists and people with physical disabilities. But users must remember to verbally insert periods, question marks, and commas. In addition, users must ''train" the software to recognize their voices to get accurate results. For most computer users, it is easier to type.

But even imperfect speech recognition can be useful. For example, when systems need to recognize only a few dozen words, they can work well even without being trained; they will recognize these words when spoken by almost anybody. Nuance and other companies make telephone call center software that lets callers issue speech commands instead of punching touch-tone buttons. ''You can get a lot more done with speech," said Datamonitor analyst Ri Pierce-Grove. For example, it is hard to buy a plane ticket to, say, Fort Worth, by pressing buttons on a phone. ''Whereas with speech, you can say 'I want to go to Fort Worth,' " Pierce-Grove said.

Imperfect speech recognition is also good enough to pick out important words from a stream of conversation. The NICE Systems software can identify angry customers by their use of certain commonplace insults. It can even detect changes in the voice that indicate anger.

''We measure 26 different parameters that measure deviations in such things as pitch, tone, tonality, cadence," said Eyal Danon, NICE's vice president of global marketing. Within minutes, the system can spot a disgruntled customer and warn supervisors, who can intervene to solve the problem. The system is also used to analyze thousands of hours of recorded conversations, so call center workers can learn the best methods for keeping customers happy.

Other uses for speech recognition are more controversial. Consider President Bush's decision to order warrantless surveillance of telephone calls between people inside the United States and suspected terrorists overseas. In many cases, the calls are not monitored by humans. Instead, a speech-recognition system listens for callers to detect vital keywords that might suggest the existence of a terrorist plot.

BBN Technologies, the Cambridge company that developed much of the early technology for the Internet, has worked with the military for the past three years to develop highly accurate speech-snooping software.

''The three target languages for the program were Arabic, US English, and Mandarin Chinese," said Alex Laats, president of BBN's Delta division, a business unit that finds commercial applications for BBN's military and intelligence research projects.

Laats said that even he does not know how the government is using BBN's technology. But his Delta team has figured out a clever commercial application for it by creating Podzinger, a free online service that lets people look up information stored in podcasts -- free audio files posted by Internet users. Thousands of people record their opinions on everything from politics to the latest movies, then publish the recordings online, where anybody can download them. But how does someone find a podcast on a specific subject without listening to it?

Podzinger, located at www.podzinger.com, uses BBN speech-recognition software to transcribe thousands of podcasts. It generates an index of all the words used. Then Podzinger uses standard search-engine technology to create an index of the words. Looking up a podcast that discusses the Boston Red Sox is as easy as using Google to find a Red Sox website. Once it is located, a listener can download the entire audio file, or just listen to the part where the words ''Red Sox" are used.

A rival Internet company, Blinkx.com, offers an alternative podcast search engine. But Blinkx also uses speech recognition to create a searchable index of Internet video feeds from online news sources like Fox News and the BBC. More speech-based innovations are on the way. Last year, InfoByPhone Inc. of Irvine, Calif., launched AskMeNow, a service that lets cellphone users ask questions on any subject for 49 cents a query. Answers are transmitted to the customer's cellphone as text messages.

AskMeNow currently records incoming questions, then routes the audio to an office in the Philippines, where researchers look up the answers. But InfoByPhone's chief technology officer Don Stern said ''it's our intention to automate as much of this process as possible." So the company will adopt speech-recognition software this year. Converting incoming questions to text will help researchers work faster. Eventually, it will let AskMeNow's computers look up many answers automatically, with no need for human assistance.

Hiawatha Bray can be reached at bray@globe.com.

SEARCH THE ARCHIVES
 
Today (free)
Yesterday (free)
Past 30 days
Last 12 months
 Advanced search / Historic Archives