[maemo-developers] R: Some weird questions about speech2text, text2speech, GPS ...

Mon Jul 16 18:03:58 EEST 2007

Thanks, David !!!
Thanks for Your answer and for Your work.
Unfortunately I'm very new on embedded platforms, so I suppose I will not be
very helpful, on a development point of view, in the short period ... ;-(
When You speak about "acoustic models", You mean "English acoustic models",
is it true ? Do some different language acoustic models exist ? Do Italian
language acoustic models exist ? If not, which do You think would be the
effort to put it up, in terms of time and knowledge ?
Thanks again !!!

	Marco

-----Messaggio originale-----
Da: David Huggins-Daines [mailto:dhuggins at cs.cmu.edu] 
Inviato: lunedì 16 luglio 2007 16.00
A: Marco Solari
Cc: maemo-developers at maemo.org
Oggetto: Re: Some weird questions about speech2text, text2speech, GPS ...

Marco Solari wrote:
> I am currently evaluating the porting to the ARM architecture of 
> open-source projects 'rsynth' and 'festival' for text2speech, and 
> 'pockesphinx' for speech2text ...
Hi, I'm working on speech recognition for the N800.  Plans and status can be
found at http://lima.lti.cs.cmu.edu/mediawiki/PocketSphinx and
http://lima.lti.cs.cmu.edu/mediawiki/GStreamerSphinx

To summarize:

 * ALSA doesn't work very well so we need to use GStreamer for audio input
 * This is not actually very hard.  I'm working on a GStreamer plugin for
PocketSphinx.  Currently it is in "proof of concept" stage, i.e. it
recognizes speech but isn't configurable or stable, you can get it at
https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/gst-pocketsphi
nx/
 * The N800 is a bit slower than I expected (slower than my first-generation
iPaq) so we will have to do some more tuning to get acceptable performance
for large-vocabulary tasks like dictation.  For simple commands it should be
just fine though.
 * I haven't put up acoustic models yet but will do so pretty soon.  The
ones included with PocketSphinx will not work because they expect 16kHz
sampling rate, and the N800 can only do 8kHz.  On the bright side the audio
quality from the lapel microphone on the headset that comes with the device
is pretty good (the onboard mic is not suitable for speech recognition).

Stuff I would appreciate some help with includes:

 * GStreamer.
 * How to create a new input method.