[maemo-developers] [maemo-developers] Improving Cairo performance on the N800

Sun Jan 21 02:26:18 EET 2007

(resending this now that the mailing list is back up)

On 1/16/07, Zeeshan Ali <zeenix at gstreamer.net> wrote:
> Hello!
>
> > Now, the recently announced Nokia N800 is different from the 770 in
> > various ways that are interesting for Cairo performance. I've got my
> > eye on the ARMv6 SIMD instructions and the PowerVR MBX accelerator.
>
>    Yeah! me too. The combined power of these two can make it possible
> to optimize a lot of nice free software out there for the N800 device.
>  However! while former is fully documented and the documentation is
> available for general public, it doesn't have a lot to offer. ARMv6
> SIMD only operate on 32-bit words and hence i find it unlikely that it
> can be used to optimize double fp emulation in contrast to the intel
> wirelesss MMX, which provides a big bunch of 128-bit (CORRECTME: or
> was it 64- bit?) SIMD instructions. OTOH, these few SIMD instructions
> can still be used to optimize a lot of code but would it be a good
> idea for cairo if you need to convert the operand values to ints and
> the result(s) back to float?

No int <-> float conversion necessary. At this level, cairo uses ints
exclusively. To clarify, the part of cairo I'm thinking could use the
ARM SIMD is the pixman library which is almost an exact client-side
mirror (copy, really) of the fb section of the X server. It's the part
that implements the Porter-Duff operators in software. Floats are long
out of the picture at this point.

This misunderstanding is common due to wide-spread confusion regarding
what role floating-point plays in cairo's internals. Most floats that
arrive via an API call are converted into an integer type (e.g.
fixed-point) early on. Cairo uses integer arithmetic for most of its
internal computation. With that clarification, it should be no
surprise that much of the recent FP optimizations in cairo was just a
matter of speeding up conversions from floating point to an integer
type.

Anyway, I think the 32-bit ARM SIMD could possibly get us some speedup
similar to how the existing MMX/SSE code has helped for x86 (for the
curious ones, see fbmmx.c in cairo or xserver). And since the MMX/SSE
code hasn't needed to drop down to raw assembly for to get a nice
speedup (it uses intrinsics), your ARM SIMD intrinsics code is much
appreciated.

Dan Amelang