[maemo-developers] Some weird questions about speech2text, text2speech, GPS ...

Mon Jul 16 16:59:47 EEST 2007

Marco Solari wrote:
> I am currently evaluating the porting to the ARM architecture of open-source
> projects 'rsynth' and 'festival' for text2speech, and 'pockesphinx' for
> speech2text ...
Hi, I'm working on speech recognition for the N800.  Plans and status 
can be found at http://lima.lti.cs.cmu.edu/mediawiki/PocketSphinx and 
http://lima.lti.cs.cmu.edu/mediawiki/GStreamerSphinx

To summarize:

 * ALSA doesn't work very well so we need to use GStreamer for audio input
 * This is not actually very hard.  I'm working on a GStreamer plugin 
for PocketSphinx.  Currently it is in "proof of concept" stage, i.e. it 
recognizes speech but isn't configurable or stable, you can get it at 
https://cmusphinx.svn.sourceforge.net/svnroot/cmusphinx/trunk/gst-pocketsphinx/
 * The N800 is a bit slower than I expected (slower than my 
first-generation iPaq) so we will have to do some more tuning to get 
acceptable performance for large-vocabulary tasks like dictation.  For 
simple commands it should be just fine though.
 * I haven't put up acoustic models yet but will do so pretty soon.  The 
ones included with PocketSphinx will not work because they expect 16kHz 
sampling rate, and the N800 can only do 8kHz.  On the bright side the 
audio quality from the lapel microphone on the headset that comes with 
the device is pretty good (the onboard mic is not suitable for speech 
recognition).

Stuff I would appreciate some help with includes:

 * GStreamer.
 * How to create a new input method.