[maemo-developers] R: Some weird questions about speech2text, text2speech, GPS ...

Mon Jul 16 18:25:57 EEST 2007

Marco Solari wrote:
> When You speak about "acoustic models", You mean "English acoustic models",
> is it true ? Do some different language acoustic models exist ? Do Italian
> language acoustic models exist ? If not, which do You think would be the
> effort to put it up, in terms of time and knowledge ?
>   
Yes, to get good accuracy, you need a different acoustic model for each 
language (and also preferably for different speaking styles, recording 
device, etc...)  CMU will be releasing some more free acoustic models 
soon but I don't think Italian is one of the languages that we have.  
The GlobalPhone Project 
http://www.cs.cmu.edu/~tanja/GlobalPhone/index-e-wel.html collected a 
lot of data from different languages and built multi-lingual acoustic 
models but I don't believe these are publicly available.

However, Spanish (particularly American Spanish) acoustic models would 
probably work reasonably well for Italian, and we are going to release 
some of them soon.

It's a large amount of work to make an acoustic model - for anything 
more than simple tasks you need to have at least 15 hours of speech from 
a number of different speakers.  To actually record and transcribe this 
yourself is a lot of work, so it's advisable to use pre-existing 
databases of speech, such as you can get from the LDC 
(http://ldc.upenn.edu/) - I think there is a European consortium similar 
to this but I don't recall the name.  Obviously there is a lot of data 
out there, such as radio and TV, European and national parliaments, but 
the problem is getting ahold of it and putting it into a form suitable 
for training.

If you only need it to recognize your own speech then this is quite a 
lot easier and you won't have to record more than 500 sentences or so.  
We have a project here at CMU which allows you to do this over the Web, 
but it's not open to the public yet.