[maemo-developers] N800 & Video playback
From: Siarhei Siamashka siarhei.siamashka at gmail.comDate: Tue Apr 24 09:46:52 EEST 2007
- Previous message: N800 & Video playback
- Next message: N800 & Video playback
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Friday 20 April 2007 10:39, you wrote: > The primary conversion we do isn't planar -> packed (this is a fallback > for when the video is obscured), but from planar to another custom > planar format. It would be good to get ARM assembly for the fallback > path, but most of the problem when using packed lies in having to > transfer the much larger amount of data over the bus. It is only a problem of definition :) Whatever it is, packed or planar, this YUV420 format is not YV12. So it still needs conversion which is performed by only reordering bytes and is not much different from packed YUY2 (except that it requires less space and bandwidth). > There's one optimisation that could be done for the YUV420 conversion > (the custom planar format that Hailstorm takes), which removes a branch, > ensures 32-bit writes always (instead of one 32-bit and one 16-bit per > pixel), and unrolls a loop by half. Might be interesting to see what > effect this has, but I think it'll still be rather small. My main performance concern is exactly about this 'omapCopyPlanarDataYUV420' function. My experience from Nokia 770 video output code optimization shows that optimization effect can be really huge (it was 1.5x improvement on Nokia 770 for unscaled YV12 -> YUY2 conversion going from a simple loop in C to optimized assembly code, I provided a link to the relevant code in my previous post). But N800 code can be probably improved more because now it contains unnecessary branch in the inner loop and branches are expensive on long pipeline CPUs. Such color format conversion performance should be comparable to that of memcpy if done right (it is about half memcpy speed on Nokia 770 for unscaled YV12 -> YUY2 conversion). But only benchmarks can be a real proof, any premature speculations are useless and even harmful. Do you remember the times when nobody from Nokia believed that ARM core could be good for video decoding on 770? ;-) Testing with Nokia_N800.avi video on N800: # mplayer -benchmark -quiet -noaspect Nokia_N800.avi BENCHMARKs: VC: 29,525s VO: 15,029s A: 0,453s Sys: 59,919s = 104,925s BENCHMARK%: VC: 28,1390% VO: 14,3232% A: 0,4313% Sys: 57,1065% = 100,0000% BENCHMARKn: disp: 2511 (23,93 fps) drop: 0 (0%) total: 2511 (23,93 fps) Enabling direct rendering (avoids extra memcpy in mplayer, but requires to disable OSD menu): # mplayer -benchmark -quiet -noaspect -dr -nomenu Nokia_N800.avi BENCHMARKs: VC: 29,826s VO: 12,365s A: 0,437s Sys: 60,555s = 103,182s BENCHMARK%: VC: 28,9058% VO: 11,9833% A: 0,4236% Sys: 58,6873% = 100,0000% BENCHMARKn: disp: 2504 (24,27 fps) drop: 0 (0%) total: 2504 (24,27 fps) Testing the same video on Nokia 770: # mplayer -benchmark -quiet -noaspect Nokia_N800.avi BENCHMARKs: VC: 44,982s VO: 7,998s A: 0,884s Sys: 47,936s = 101,801s BENCHMARK%: VC: 44,1862% VO: 7,8568% A: 0,8688% Sys: 47,0882% = 100,0000% BENCHMARKn: disp: 2502 (24,58 fps) drop: 0 (0%) total: 2502 (24,58 fps) So Nokia 770, having slower CPU, slower memory and using less efficient output format (16bpp vs. 12bpp), still requires less time for video output than N800 (7,998s vs. 12,365s). Graphics bus performance is unrelated here as it is asynchronous operation and it is fast enough. Surely N800 also has some extra overhead because of interprocess communication with xserver, but looks like YV12 -> YUV420 conversion is quite a bottleneck here too. It should be noted that while Nokia_N800.avi video has low resolution and N800 has no problems decoding and displaying it, our goal is higher resolution videos such as 640x480. Getting to higher resolutions will increase color format conversion overhead. As it can be seen from these benchmarks, video output on N800 takes quite a significant time when compared with time needed for decoding (29,826s for decoding, 12,365s for video output). I can make an assembly optimized code for YV12 -> YUV420 conversion. Is there any chance that such optimization could be also integrated into xserver in one of the next firmware updates if it really provides a significant performance improvement? N800 is almost able to play VGA resolution videos properly, it only needs a bit more optimizations. Color format conversion performance for video output is one of the important things that can be improved. > > So for any performance optimizations experiments which result in > > immediate video performance improvement, either direct framebuffer access > > should be used again or it would be very nice if xserver could provide > > direct access to framebuffer (video planes) in yuy2 and that custom > > yuv420 format in one of the next firmware updates. The xserver itself > > should not do any excess memory copy operations as they degrade > > performance (and it does such copy for yuy2 at least). > > 'Direct framebuffer access'? As in, just hand you a pointer to a > framebuffer somewhere and let you write straight to it? As this would > require a firmware update anyway, I don't really see how this would > improve matters too much, and I really don't want to write any more > Maemo-specific extensions (I've been working very hard to kill XSP). Direct framebuffer access will eliminate the need for extra memcpy while allowing to use OSD menu and subtitles and make everything much easier (currenty this is how MPlayer works on Nokia 770). You can compare the benchmark results with direct rendering enabled and disabled above. It saves ~3 seconds of CPU time on playing Nokia_N800.avi video. Direct rendering allows to use Xv buffers and decode video in-place. But unfortunately as data from these buffers is used as reference frames for decoding next frames, they should be non-modified. And this all makes implementing OSD and subtitles tricky. Having access directly to framebuffer eliminates the need to use this direct rendering technique and saves us from the complexities associated with it. > > Also I'm curious about that yuv420 format. From the comments in your > > code, it looks like it is different from what is described in Epson docs. > > That seems a bit weird. > > Which Epson docs? The one mentioned by Frantisek. Well, it was just a comment for 'omapCopyPlanarDataYUV420' function wrong and misleading, nevermind :-) Now everything is clear.
- Previous message: N800 & Video playback
- Next message: N800 & Video playback
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]