[maemo-developers] N800 & Video playback

From: Daniel Stone daniel.stone at nokia.com
Date: Tue Apr 24 12:36:46 EEST 2007
On Tue, Apr 24, 2007 at 09:46:52AM +0300, ext Siarhei Siamashka wrote:
> On Friday 20 April 2007 10:39, you wrote:
> > There's one optimisation that could be done for the YUV420 conversion
> > (the custom planar format that Hailstorm takes), which removes a branch,
> > ensures 32-bit writes always (instead of one 32-bit and one 16-bit per
> > pixel), and unrolls a loop by half.  Might be interesting to see what
> > effect this has, but I think it'll still be rather small.
> My main performance concern is exactly about this 'omapCopyPlanarDataYUV420'
> function. My experience from Nokia 770 video output code optimization shows
> that optimization effect can be really huge (it was 1.5x improvement on Nokia
> 770 for unscaled YV12 -> YUY2 conversion going from a simple loop in C to
> optimized assembly code, I provided a link to the relevant code in my previous
> post). But N800 code can be probably improved more because now it contains
> unnecessary branch in the inner loop and branches are expensive on long
> pipeline CPUs. Such color format conversion performance should be
> comparable to that of memcpy if done right (it is about half memcpy speed on
> Nokia 770 for unscaled YV12 -> YUY2 conversion).

Right, the branch is a problem, and as I said, the branch can be avoided
and the writes optimised to be three 32-bit writes for two macroblocks,
instead of two 32-bit writes and two 16-bit writes.

However, I don't think the lessons from the 770 are necessarily
_directly_ applicable to the N800: on the 770, our bottleneck is
decoding speed.  The bottleneck on the N800 is exactly the opposite:
video output.

> But only benchmarks can be a real proof, any premature speculations are
> useless and even harmful. Do you remember the times when nobody from 
> Nokia believed that ARM core could be good for video decoding on 770? ;-)

Actually, I don't, since I've always mainly worked on the N800. ;)  But
still, if there's dedicated hardware we can use to remove load from the
ARM and let it get on with tasks, and it can perform to an adequate
level, there's no reason to avoid it.

> So Nokia 770, having slower CPU, slower memory and using less efficient 
> output format (16bpp vs. 12bpp), still requires less time for video output
> than N800 (7,998s vs. 12,365s). Graphics bus performance is unrelated here 
> as it is asynchronous operation and it is fast enough. Surely N800 also has
> some extra overhead because of interprocess communication with xserver, but
> looks like YV12 -> YUV420 conversion is quite a bottleneck here too.

Bear in mind that, unless you explicitly disable it (the Xv attribute is
something like XV_OMAP_VSYNC), the X server _will_ flush all pending
writes before the next frame is put through.  Else you get tearing,
because you can be halfway through an update, and writing the next frame
to the framebuffer, so which frame is being picked up, changes halfway

Try forcing XV_OMAP_VSYNC (or whatever it is) to 0, and comparing the

> I can make an assembly optimized code for YV12 -> YUV420 conversion. Is there
> any chance that such optimization could be also integrated into xserver in one
> of the next firmware updates if it really provides a significant performance
> improvement?

Yeah, if there's measurable benefit, I'll include it.

> N800 is almost able to play VGA resolution videos properly, it only needs a
> bit more optimizations. Color format conversion performance for video output
> is one of the important things that can be improved.

I don't believe it's on the critical path.  The optimisation I mentioned
before will bring us up to the point where any improvement that we can
make in that conversion will be eclipsed by the time taken to send it
over the bus, I believe.  But I can't prove that.

> > Which Epson docs?
> The one mentioned by Frantisek. Well, it was just a comment 
> for 'omapCopyPlanarDataYUV420' function wrong and misleading, 
> nevermind :-) Now everything is clear.

Hmm, is it?  Because, unless I was _really_ tired at the time I wrote it
(which is entirely possible), that's what the code does, and it works,
so ...

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.maemo.org/pipermail/maemo-developers/attachments/20070424/2cac5f5b/attachment.pgp 
More information about the maemo-developers mailing list