boston.com Business your connection to The Boston Globe

Look, Ma, no hands!

Consider yourself witness to the first story I've ever talked out. A few months ago, I developed carpal-tunnel syndrome, which along with a pinched nerve in my knack neck, made it difficult for me to type. Not an ideal situation for someone who types for letting living. But out of my newfound disability sprung an idea: Why not give voice recognition a try and see if it lived up to its hands-free promise?

Note to the reader: I left in the corrected mistakes, above, when the voice-recognition program misinterpreted my words (for example, writing ''knack" instead of ''neck"). I dictated the first paragraph after working with the program for about eight hours. Judge for yourself how accurate, or inaccurate, the technology is.

Five years ago, at the start of a new millennium, voice recognition was still relatively new, trendy, and much-hyped. People weren't meant to sit in front of a computer and type, or so the story went. Speech was more natural. Imagine navigating the Web, dictating e-mail, or writing legal briefs without ever lifting a finger.

But the reality was far different. Speech-recognition programs were clunky and difficult to use. The text was so laden with errors that it was often impossible to decipher. Correcting the document took longer than it would to type out the entire thing using the hunt-and-peck method.

But when the shooting pains and numbness that come with carpal tunnel syndrome began interfering with work, I decided, with a gentle nudge from the technology editor, that it was time to approach voice recognition with an open mind. It might even be fun.

The first program I tried was IBM Corp.'s ViaVoice release 9.1 Pro edition. I loaded it on my work PC, attaching the microphone, setting the audio level, and reading a chapter from ''Treasure Island" so the program could familiarize itself with my speech mannerisms.

For practice, I decided to dictate a story I had just finished writing. What came out bore only a vague resemblance to what I had said:

''After five years and began living plainclothes and Lana, EPB and J sandwiches and crashing on friends' couches as on is finally getting some notice."

The translation: ''After five years of eking out a living playing small clubs in Atlanta, eating PB&J sandwiches, and crashing on friends' couches, Aslyn is finally getting some notice."

I reminded myself to be patient. After all, the more you use the software, the better it gets to know you, and the more accurate it becomes. But within hours, all I could think was, ''Thank God I have hands." It took 10 minutes to dictate and correct a three-sentence e-mail. No matter how clearly I spoke, the program refused to obey my commands. Before I knew it, I was shouting a string of profanities at the computer.

That was when I knew it was time to move on.

Reviewers had raved about ScanSoft Inc.'s Dragon Naturally-Speaking 8 Preferred edition. One columnist claimed to have written entire columns without a single mistake. But I was now officially among the faithless. Sure enough, the problems began right away. During installation, an error message informed me that my computer didn't have enough memory. So I packed up my things to try again at home. This time, the installation went off without a hitch.

I picked up a Boston Globe and began to dictate. When I looked up from the newspaper, I could hardly believe my eyes. Aside from the fact that the opening sentence was missing hyphens (I omitted them because I didn't know the correct command), there were only two mistakes. The rest of the story was just as accurate.

That night, I lay in bed dictating and sending e-mails on my laptop. Now this was fun!

But as I drifted off to sleep I found myself wondering how the two software programs could be so far apart in accuracy. The next day, I put on my reporter hat, called IBM and ScanSoft, and began asking some questions.

The programs work on a similar premise. They both string together syllables into words and words into sentences, based on the laws of probability. When the computer hears ''ka" and ''t," it generates a list of possible words like cat, kit, or cart. But before picking one of them, it listens for other words to give it context. If the next word is litter, for example, the program deduces that the first word was cat.

The probability models are based on millions of written and spoken documents, the companies said. Language statisticians create language models that help the computer adapt to anything from a Boston accent to a Southern drawl. And as you use ViaVoice or NaturallySpeaking, their accuracy improves, because they learn your patterns and manner of speech.

But to be effective, the programs rely on established user profiles. So for all that voice recognition can do, it can't understand a stranger. It can't take notes for me while I interview a source for a story, and it can't transcribe a digital recording of someone it's never heard before.

Among the other drawbacks: You can't eat while you write.

And without a special upgrade, you can't be entirely hands-free.

''It's significantly faster to do certain things using your hands than your voice," said Matt Revis, ScanSoft's product manager for Dragon NaturallySpeaking.

As for the difference between the two software packages, it turns out ViaVoice 9.1 wasn't the latest version of the software on the market. IBM introduced ViaVoice 10 in 2002. What a difference a generation makes when it comes to a new technology.

Thanks to a lot of ice baths and physical and occupational therapy, the carpal tunnel is in remission.

But I'm thinking of making voice recognition a regular part of my writing life, because according to ScanSoft's Revis, ''It forces you to think before you speak. We find that makes many people become better writers."

Naomi Aoki can be reached at naoki@globe.com.

SEARCH THE ARCHIVES
 
Today (free)
Yesterday (free)
Past 30 days
Last 12 months
 Advanced search / Historic Archives