[maemo-developers] [maemo-developers] Fast 16bpp alpha blending (was: Improving Cairo performance on the N800)

From: Siarhei Siamashka siarhei.siamashka at gmail.com
Date: Tue Jan 30 23:54:35 EET 2007
On Thursday 18 January 2007 13:46, Gustavo Sverzut Barbieri wrote:

> > By the way, free software is really poorly optimized for ARM right now.
> > For example, SDL is not optimized for ARM, xserver is probably not
> > optimized as well, a lot of performance critical parts of code in various
> > software are still only implemented in C for ARM while they have x86
> > assembly optimizations long ago. Considering that Internet Tablets might
> > have a tight competition  with x86 UMPC devices in the near future, ARM
> > poweded devices are at some disadvantage now. Is this something that we
> > should try to change? :-)
> Yes. Since at INdT we use a lot of SDL, GTK and in future Evas, we are
> interested in optimizing this.
> One thing that can be optimized is 16bpp operations. Moving SDL
> surfaces to be optimized, packing 16bpp RGB into one plane and 1
> byte-Alpha in another plane, we could use multiple store (stm) and
> improve things a bit.
> If we could achieve ~24fps blitting fullscreen 16bpp+Alpha, it would
> rock! :-) Right now we do 18fps, but we still need that function with
> separated planes + stm. I'll ask Lauro to send them as soon as we get
> it working.
> Anyone willing to help evas port to work with 16bpp+Alpha internally?
> Evas is a great canvas, can interoperate with Glib main loop easily
> and provides high level utilities, like text layout (pango-like),
> gradients, the concept of objects to animate and is scriptable really
> easy (with optimizations!).

Regarding 16bpp alpha blending, I did some optimization (not involving
assembly yet) for maemo build of ufo2000 [1]. The code which is currently 
used, is based on and extends RLE sprites from Allegro game programming
library [2]. The sources can be found here:

The goal was to get support for drawing isometric tiles with the support of
alpha channel (for fire, smoke, explosions, window glass, ...) and adjustable
brigtness (for lighting effects on night missions simulation).

The code works as allegro addon library and allows loading sprites from
PNG files. It automagically detects presence or absence of alpha channel and
converts images into optimal format which allows fast blending (for alpha
channel) and store it in a compact form (for images without alpha channel).
When blitting sprite, brightness ranging from 0 - 255 is used as an additional
argument. The code uses C++ templates to support all the possible variants 
of bit depth and blending type (may be not a very good idea for the code
intended for submission into C library later :) ).

The trick used to speed up alpha blending was to store each pixel data in
a special 32-bit representation with R, G, B and alpha channel arranged in a
special way for better performance.

So imagine that we have 16-bit pixel in RGB565 format and alpha channel. 
We convert it into this 32-bit preprocessed data according to the following

uint32_t convert_pixel(uint16_t rgb565, int alpha)
    uint32_t n = (alpha + 1) / 8, x = rgb565;
    x = (x | (x << 16)) & 0x7E0F81F;
    return x | (n << 5);

Now if we need to do alpha blending (with some buffer in memory), we do the
following (d - destination pixel data buffer, s - buffer with preprocessed
32-bit pixel data, w - number of pixels to blend, n - brightness level (0 - 

uint16_t *draw_alpha_dark_line16(uint16_t *d, uint32_t *s, int w, uint32_t n)
     while (--w >= 0) {
        uint32_t x = *s++;
        uint32_t y = (uint16_t)*d;
        uint32_t result = (x >> 5) & 0x3F;
        x = ((x & 0x7E0F81F) * n / 32) & 0x7E0F81F;
        y = (y | (y << 16)) & 0x7E0F81F;
        result = ((x - y) * result / 32 + y) & 0x7E0F81F;
        *d++ = (result | (result >> 16));
    return d;

This code works quite fast (at the cost of some precision loss though). It 
is perfectly suitable for isometric tile based games and probably other
applications which only need lightning fast blending and do not need any 
extra operations with sprites (rotation for example). Removing brightness
level support makes the code even faster.

Using this code in ufo2000 allows it to keep reasonably high framerate (more
than 10 fps) even on complicated scenes full of fire and smoke animation for
example. I hope this information may be useful for other maemo game 
developers or anyone in need of fast 16bpp alpha blending code.

PS. Optimizing alpha blending using assembly can most likely improve 
performance even more :)

[1] http://ufo2000.sourceforge.net
[2] http://alleg.sourceforge.net

More information about the maemo-developers mailing list