News:

It's funny how the position for boot-licking is so close to the one used for curb-stomping.

Main Menu

AI enthusiasts: The unrecognized death of speech recognition

Started by Triple Zero, May 03, 2010, 01:29:50 PM

Previous topic - Next topic

Triple Zero


The unrecognized death of speech recognition

The accuracy of computer speech recognition flat-lined in 2001, before reaching human levels. The funding plug was pulled, but no funeral, no text-to-speech eulogy followed.

[...]

In 2001 recognition accuracy topped out at 80%, far short of HAL-like levels of comprehension. Adding data or computing power made no difference. Researchers at Carnegie Mellon University checked again in 2006 and found the situation unchanged. With human discrimination as high as 98%, the unclosed gap left little basis for conversation. But sticking to a few topics, like numbers, helped. Saying "one" into the phone works about as well as pressing a button, approaching 100% accuracy. But loosen the vocabulary constraint and recognition begins to drift, turning to vertigo in the wide-open vastness of linguistic space.

[...]

A 1996 look at the state of the art reported that "Despite over three decades of research effort, no practical domain-independent parser of unrestricted text has been developed." As with speech recognition, parsing works best inside snug linguistic boxes, like medical terminology, but weakens when you take down the fences holding back the untamed wilds. Today's parsers "very crudely are about 80% right on average on unrestricted text," according to Cambridge professor Ted Briscoe, author of the 1996 report. Parsers and speech recognition have penetrated language to similar, considerable depths, but without reaching a fundamental understanding.

[...]

We are surrounded by unceasing, rapid technological advance, especially in information technology. It is impossible for something to be unattainable. There has to be another way. Right? Yes—but it's more difficult than the approach that didn't work. In place of simple speech recognition, researchers last year proposed "cognition-derived recognition" in a paper authored by leading academics, a scientist from Microsoft Research and a co-founder of Dragon Systems. The project entails research to "understand and emulate relevant human capabilities" as well as understanding how the brain processes language. The researchers, with that particularly human talent for euphemism, are actually saying that we need artificial intelligence if computers are going to understand us.

Originally, however, speech recognition was going to lead to artificial intelligence. Computing pioneer Alan Turing suggested in 1950 that we "provide the machine with the best sense organs that money can buy, and then teach it to understand and speak English." Over half a century later, artificial intelligence has become prerequisite to understanding speech. We have neither the chicken nor the egg.

   

Strings, heavy with meaning

http://robertfortner.posterous.com/the-unrecognized-death-of-speech-recognition

It's true! Computers have gotten insanely more powerful the past decades, we got huge mega corpuses of annotated language data, syntactically and semantically, so WHAT UP AI COMMUNITY? How come computers can't recognize speech any better than 10 years ago?
Ex-Soviet Bloc Sexual Attack Swede of Tomorrow™
e-prime disclaimer: let it seem fairly unclear I understand the apparent subjectivity of the above statements. maybe.

INFORMATION SO POWERFUL, YOU ACTUALLY NEED LESS.

Rococo Modem Basilisk

It *does* seem like an AI-complete problem. I mean, a lot of speech is easily parsed only with context.

Has trained speaker-dependent arbitrary-vocabulary-size speech recognition hit the same plateau here?


I am not "full of hate" as if I were some passive container. I am a generator of hate, and my rage is a renewable resource, like sunshine.

MMIX

I suspect that the answer may ultimately lie in one of the bits you didn't quote
Quote
Statistics veiling ignorance
[my emphasis]

Many spoken words sound the same. Saying "recognize speech" makes a sound that can be indistinguishable from "wreck a nice beach." Other laughers include "wreck an eyes peach" and "recondite speech." But with a little knowledge of word meaning and grammar, it seems like a computer ought to be able to puzzle it out. Ironically, however, much of the progress in speech recognition came from a conscious rejection of the deeper dimensions of language. As an IBM researcher famously put it: "Every time I fire a linguist my system improves." But pink-slipping all the linguistics PhDs only gets you 80% accuracy, at best.


my point being that, since the computer is dealing with statistical analysis - ie as a computer it is doing what it does best - number crunching, the whole intentionality and flexible response which characterises human speech becomes just a "ghost in the machine" to coin a pun. And that is the problem, really, the complexity of usage to which we put words, our "inherent drive"  [for want of a better expression] to play with words makes the computational task of recognising what we are saying as much more difficult than was originally believed. I guess some things are not currently reducible to simple computational problems.

"The ultimate hidden truth of the world is that it is something we make and could just as easily make differently" David Graeber

Shibboleet The Annihilator

I believe that there will be an renewed interest in further developing voice recognition with things like speech-to-text and voice commands on mobile OSes like Android as well as telephone systems like 1-800-GOOG-411 (A free information system provided by Google) and the automated menus on other support lines.


Rococo Modem Basilisk

Google stands to make a lot of money through good speech recognition, and they may have a big enough data set and enough cash for processing power to make it context sensitive through purely statistical methods. After all, if they use the same mechanism for choosing ads as for suggesting spelling corrections, they could probably eventually adapt it to guessing appropriate words in circumstances where purely phonetic analysis would still yield a dilemma, and then they could sell you ads based on the contents of your phone calls and voicemail rather than just your email, IM, blog, text documents, search queries, most visited websites, social network connections, calendar, source code, geographic interests, driving direction requests, and the contents of your hard drive.


I am not "full of hate" as if I were some passive container. I am a generator of hate, and my rage is a renewable resource, like sunshine.

Shibboleet The Annihilator

Heh, free ad-based cell phones. There would definitely be people who would use that.

Jasper

I would rather not even have a phone.  Jesus, what a terrible idea.

Triple Zero

Ex-Soviet Bloc Sexual Attack Swede of Tomorrow™
e-prime disclaimer: let it seem fairly unclear I understand the apparent subjectivity of the above statements. maybe.

INFORMATION SO POWERFUL, YOU ACTUALLY NEED LESS.