[maemo-developers] CPU video decoding test, Re: [maemo-developers] gstreamer launcher (proper video sink?)

From: Frantisek Dufka dufkaf at seznam.cz
Date: Sat May 13 14:53:01 EEST 2006
David D. Hagood wrote:
> In short, clock per clock, I suspect the DSP can do more video work than 
> the ARM can.
> Now, you *could* make the argument that if the workload of decoding 
> video could somehow be split between the 2 processor cores there might 
> be some benefit - maybe leave the video scaling to the ARM but let the 
> video coding be done by the DSP core. However, the real limiting factor 
> may not even be MIPS, but rather bandwidth 
> Remember what step 0 of optimization is: MEASURE IT FIRST!
> Until somebody can actually measure where all the time is going, making 
> pronouncements like "It's slow because of X - I just know it" are the 
> root of all evil - you spend a great deal of time tweaking that one 
> thing only to find out that it was only 1% of the time to begin with.

Hello again,

I tried to compile current CVS version of ffmpeg 
http://www1.mplayerhq.hu/cgi-bin/cvsweb.cgi/?cvsroot=FFMpeg to stop 
producing pure theories and see how video really plays on OMAP 1710 CPU 
in N770. ffmpeg compiles fine in scratchbox with few configuration 
tweaks and includes also simple SDL based player called ffplay. There 
are some optimizations in libavcodec for arm4vl architecture in arm asm. 
When these are enabled video playback in ffplay is very good.

When converting video for Tungsten T2 (OMAP 1510) and TCPMP player I 
usually use something like this:

mencoder.exe %NAME%.avi  -audio-preload 0.8 -delay 0.1  -af volnorm 
-srate 44100 -oac mp3lame -lameopts mode=2:cbr:br=128  -noodml  -vf 
scale=320:240 -sws 9 -ovc lavc -lavcopts 
vcodec=mpeg4:vhq:vmax_b_frames=0:vbitrate=304 -ffourcc DIVX -o 

Such videos plays adequately in 25fps on T2. In some scenes frames are 
skipped but generally the playback is good.

I used same files on N770 and while they also play acceptably in N770 
video player it is a bit worse than on T2 and in more complex scenes the 
video player hangs randomly. When this happens video is unplayable for 
some time (10-20 seconds?) until the DSP is automatically restarted.

When I tried same videos with ffplay it plays fine when the audio is 
turned off (ffplay -an video.avi). Looks like the ffmpeg libavcodec mp3 
implementation is not optimized for arm (uses floats?). Video plays by 
default scaled to 640x480 and  even in this resolution playback is 
fluent. CPU utilization is between 50-100% mostly around 75% (just a 
guess from load plugin applet). When using 320x240 'ffplay -an -x 320 -y 
240 video.avi'  (which is what the default N770 video player does as it 
uses HW pixel doubling) it is even better. In this resolution CPU is 
rarely at 100%, mostly at 50-75%.

You can download ffplay binary compiled for N770 from 
http://fanoush.webpark.cz/maemo/ffplay.gz for a quick test with your 
videos. If you get access denied paste the url directly into URL bar, 
webpark.cz free hosting doesn't like direct links to binaries (=foreign 
HTTP referer field). Or just checkout ffmpeg from CVS and compile yourself.

Of course this is just proof of concept as audio is not usable but it 
proves the CPU is fast enough to decode mp4 video better (=faster, more 
stable) than current DSP implementation. Further it also proves that the 
'bandwidth problem' is not so bad. Even in 640x480 blitting to video 
memory seems to be good enough.

Maybe there is also some room for further optimizations. The ffmpeg code 
doesn't use edsp instructions available in armv5te (maybe they are not 
so useful in reality?) and it is also not the fastest implementation 
even for armv4. The TCPMP player uses different and faster mp4 decoder. 
It includes optional ffmpeg plugin but only as a slower but more 
compatible implementation. Also I'm not sure how optimized is SDL code 
on N770. From the kernel framebuffer source 
(drivers/video/omap/hwa742.c) it looks like the display supports YUV 
surfaces directly but maybe ffplay and SDL uses RGB so there is one or 
two extra YUV<->RGB conversion steps.


More information about the maemo-developers mailing list