[maemo-developers] One question about speech2text poor performance

From: Graham Cobb g+770 at cobb.uk.net
Date: Tue Jan 15 21:59:29 EET 2008
On Tuesday 15 January 2008 19:01:11 Mike Klein wrote:
> History lesson: 1Mhz Apple][ had >90% voice recognition
> ability...program was written by Bill Budge I believe.
>
> The resources have to be there.

Back in the days when I knew something (not much) about speech-rec the issue 
was not CPU power, nor was it much about clever software.  The biggest 
differentiator between products was the amount of speech sample data the 
vendor had access to.  The more speech samples, the better the product.  

If the solution needed training it could be quite cheap -- cheap enough to be 
usable for disabled users to control computers, for example.  But people hate 
training and untrained solutions were extremely expensive.

At that time (about 10 years ago) what you were paying for when you bought a 
commercial speech-rec solution was the amount of money the vendor had spent 
in collecting samples: paying students, housewives, manual labourers, 
executives, children, etc., etc. everywhere the language was spoken to 
collect many, many samples of all the words they needed to recognise.  It 
cost a lot of money and, not surprisingly, the people who collected and owned 
that data wanted lots of money to provide it.

What was most noticeable was that recognition rates were dependent on the 
economic value of the language (not on CPU power or anything like that).  
American English was quite well recognised.  French significantly less so.  
Dutch less still.  And solutions were just not available at all for anything 
outside the top few languages (about 5-6 at that time, all Western European).

Things may have improved in the last few years, but my guess is that until 
there is a wikipedia-style project to allow people to contribute free speech 
samples, there is unlikely to be very good open source speechrec: not because 
of the software but because of the speech samples.

Oh, and by the way, if anyone wanted to volunteer to set up a website to 
collect samples please be aware that it is a very complex task: consult a 
speechrec engineer before even thinking about it.  For example, it is 
necessary to review and process all the samples (still a human ear process as 
far as I know) and it is critical that the samples are tagged with 
information about the voice (language, dialect, sex, location, age, etc) and 
recording details (how was it acquired, in a recording studio or through a 
mobile phone?, what codec, etc).

Of course, none of that necessarily explains why Nokia chose not to put 
(closed source) speechrec at least as good as that on their phones in the IT, 
although I would guess that licence fees to their suppliers may be part of 
that.

Graham

More information about the maemo-developers mailing list