[maemo-developers] N800 & Video playback
From: Daniel Stone daniel.stone at nokia.comDate: Mon Apr 30 17:49:24 EEST 2007
- Previous message: N800 & Video playback
- Next message: recompiling x server
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi, On Mon, Apr 30, 2007 at 02:27:49PM +0300, ext Siarhei Siamashka wrote: > On Friday 27 April 2007 04:43, Daniel Stone wrote: > > I don't think Tornado supports YUV420, but I can check in the specs > > tomorrow. My better C version basically does two macroblocks at a time, > > ensuring all 32-bit writes (which _really_ helps over 16-bit writes, > > believe me). This eliminates the branch, since your surface is > > guaranteed to be word-aligned, so if you do all 32-bit writes, you can > > just drop the branch as you know every write will be aligned. > > > > This will be really fast. > > Optimized YV12 -> YUV420 convertor is done. The sources can be found here: > https://garage.maemo.org/plugins/scmsvn/viewcvs.php/trunk/libswscale_nokia770/?root=mplayer > > Take a look at 'arm_colorconv.h' and 'arm_colorconv.S' files. Also there is a > test program ('test_colorconv') which can ensure that everything works > correctly and fast: > > ~ $ ./test_colorconv > [results follow] > > ARMv6 optimized YV12->YUV420 convertor is about 2.5x faster > than current code used in N800 xserver. So it should provide a nice > improvement for video :) Indeed. Unfortunately this is slightly misleading in that it only shows the raw write speed. RFBI can't deal with the sorts of speeds that your hyper-optimised version is pumping out, e.g. So it's mainly just about cutting the latency into the critical path to low enough that it makes no difference. > I doubt that your better C version can beat it or even get any close. Of course not. > There are two important optimizations in this code: > 1. Cache prefetch with PLD instruction (added in '_armv5' version) which > boosts performance to 70 megapixels per second. Inner loop is unrolled > to process 32 pixels per iteration (cache line size is 32 bytes on ARM, so > such unrolling is convenient). This is the most important improvement. > You can try using __builtin_prefetch() from C code to do the same > optimization. Ah, sounds useful. From what Dan Amelang's been saying on xorg@, gcc should coalesce four 32-bit reads into one 128-bit read, but this sounds promising as well. > 2. The use of ARMv6 instruction REV16 to do bytes swapping for high and low > 16-bit register parts, this optimization was added in '_armv6' version and > boosted performance even more to 85 megapixels per second. This > optimization is highly unlikely probably impossible for C version at all. Sounds useful. > I was a bit wrong about YUV420 format in my previous post. > > Suppose we have planar YV12 image with the following data. > Y plane: Y1 Y2 Y3 Y4 ... > U plane: U1 __ U2 __ ... > > Normal YUV420 (according to pictures in Epson docs) would be the following: > U1 Y1 Y2 U2 Y3 Y4 ... > > But appears (most likely because of 16-bit interface and some endian > differences between ARM and Epson chip) that each pair of bytes is > swapped and we actually get the following somewhat weird layout: > Y1 U1 U2 Y2 Y4 Y3 ... Right, hence the comment in the code is correct. ;) > As for the other possible Xv optimizations. You mentioned that fallback code > is not important at all. But imagine 640x480 video playback in windowed > mode. Decoding it will require quite a lot of resources, but additionally > scaling it down using a slow fallback code will be a finishing blow. In > addition, a solution (fast JIT accelerated YV12->YUY2 scaler) for this > problem already exists. I can also modify this scaler to support > YV12->YUV420 scaling. An interesting thing here is that this scaler > could be also used by xserver to solve graphics bus bandwidth > issues. Imagine that we have some high resolution video with high > framerate which exceeds graphics bus capabilities. In this case > this video can be downscaled in software using JIT scaler to lower > resolution before sending data to LCD controller. What do you think? IMO this is a policy issue, and X is 'mechanism, not policy'. If you want to adapt the scaler, I'm more than happy to include it, but I'm not about to start doing automatic scaling. IOW, 'ask a stupid question, get a stupid answer'. > That's fine. Now I'm waiting for further instructions :) Should I try to > prepare a complete patch for xserver? I'm really interested in getting > this optimization into xserver as it would help to play high resolution > videos. If you have any extra questions about the code or anything > else (for example I wonder what free license would be appriopriate > for it), don't hesitate to contact me. If you wanted to prepare a complete patch for the server, that would be great, as I don't have time to get to it right now (trying to finish off the merge with upstream, among others). As for the license, just the standard MIT boilerplate in hw/kdrive/omap/* is fine, but replace Nokia Corporation/Daniel Stone with Siarhei Siamaskha, obviously. > I did not try to build xserver sources yet as I did not have enough time > for that and xserver requires quite a number of build dependencies. Can > you share some tips and tricks about maemo xserver development. Is it > difficult to compile (do I need any extra build scripts, tools, or > configuration options) and install on N800 (is it safe to upgrade > xserver on N800 from .deb file)? It's completely safe to upgrade from a deb if it's not broken. If you set up a standard Maemo build environment and run apt-get source xorg-server and apt-get build-dep xorg-server, it should work just fine, in theory. I don't have any tips, per se. Once I get it all integrated it'll be in git, but for now, the only public source is the packages. > I also tried to use YUV420 on Nokia 770, but it did not work well. According > to Epson, this format should be supported by hardware. Also there is a > constant OMAPFB_COLOR_YUV420 defined in omapfb.h in Nokia 770 kernel > sources. But actually using YUV420 was not very successful. Full screen update > 800x480 in YUV420 seems to deadlock Nokia 770. Playback of centered > 640x480 video in YUV420 format was a bit better, at least I could decipher > what's on the screen. But anyway, it looked like an old broken TV :) Image was > not fixed but floating up and down, there were mirrors, tearings, some color > distortion, etc. After video playback finished, the screen remained in > inconsistent state with a striped garbage displayed on it. Starting video > playback with YUY2 output fixed it. But anyway, looks like YUV420 is not > supported properly in the framebuffer driver from the latest OS2006 kernel. > That's bad, it could provide ~30% improvement in video output perfrmance > for Nokia 770. Maybe upgrading framebuffer driver can fix this issue (and add > tearsync support). SoSSI is relatively quick, so you won't see much of a bandwidth win from using YUV420 over YUV422. Aside from that, I don't know, though. Thanks again for working on this; glad to see someone cares enough to help sort it out. :) Cheers, Daniel -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: Digital signature Url : http://lists.maemo.org/pipermail/maemo-developers/attachments/20070430/b7d76407/attachment.pgp
- Previous message: N800 & Video playback
- Next message: recompiling x server
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]