[maemo-developers] One question about speech2text poor performance

From: David Huggins-Daines dhuggins at cs.cmu.edu
Date: Thu Jan 17 00:22:48 EET 2008
David Huggins-Daines wrote:
> Yes, this is exactly the case.  Recognizing a limited set of names in 
> isolation is not at all computationally intensive compared to 
> recognizing full sentences of connected words.
>
> See: http://en.wikipedia.org/wiki/Dynamic_time_warping

Also, Nokia has actually invested quite a lot of research into doing 
larger vocabulary speech recognition on their phones.  They are just 
recently able to do isolated word SMS dictation with a 22000 word 
vocabulary on a S60 2nd edition phone (sorry, abstract only):

http://portal.acm.org/citation.cfm?id=1180995.1181020

This is still a less complex problem than recognizing connected speech.

That said, I am still working on real-time 5000-word connected dictation 
on the N800/N810.  I've succeeded in offloading some computation to the 
DSP, and the next step is to implement model compression techniques 
similar to the ones mentioned in that paper.


More information about the maemo-developers mailing list