[maemo-developers] [maemo-developers] Re: Fast 16bpp alpha blending (was: Improving Cairo performance on the N800)

From: Gustavo Sverzut Barbieri barbieri at gmail.com
Date: Wed Jan 31 00:11:33 EET 2007
On 1/30/07, Siarhei Siamashka <siarhei.siamashka at gmail.com> wrote:
> On Thursday 18 January 2007 13:46, Gustavo Sverzut Barbieri wrote:
>
> > > By the way, free software is really poorly optimized for ARM right now.
> > > For example, SDL is not optimized for ARM, xserver is probably not
> > > optimized as well, a lot of performance critical parts of code in various
> > > software are still only implemented in C for ARM while they have x86
> > > assembly optimizations long ago. Considering that Internet Tablets might
> > > have a tight competition  with x86 UMPC devices in the near future, ARM
> > > poweded devices are at some disadvantage now. Is this something that we
> > > should try to change? :-)
> >
> > Yes. Since at INdT we use a lot of SDL, GTK and in future Evas, we are
> > interested in optimizing this.
> >
> > One thing that can be optimized is 16bpp operations. Moving SDL
> > surfaces to be optimized, packing 16bpp RGB into one plane and 1
> > byte-Alpha in another plane, we could use multiple store (stm) and
> > improve things a bit.
> >
> > If we could achieve ~24fps blitting fullscreen 16bpp+Alpha, it would
> > rock! :-) Right now we do 18fps, but we still need that function with
> > separated planes + stm. I'll ask Lauro to send them as soon as we get
> > it working.
> >
> > Anyone willing to help evas port to work with 16bpp+Alpha internally?
> > Evas is a great canvas, can interoperate with Glib main loop easily
> > and provides high level utilities, like text layout (pango-like),
> > gradients, the concept of objects to animate and is scriptable really
> > easy (with optimizations!).
>
> Regarding 16bpp alpha blending, I did some optimization (not involving
> assembly yet) for maemo build of ufo2000 [1]. The code which is currently
> used, is based on and extends RLE sprites from Allegro game programming
> library [2]. The sources can be found here:
> http://ufo2000.svn.sourceforge.net/viewvc/ufo2000/trunk/src/fpasprite/
>
> The goal was to get support for drawing isometric tiles with the support of
> alpha channel (for fire, smoke, explosions, window glass, ...) and adjustable
> brigtness (for lighting effects on night missions simulation).
>
> The code works as allegro addon library and allows loading sprites from
> PNG files. It automagically detects presence or absence of alpha channel and
> converts images into optimal format which allows fast blending (for alpha
> channel) and store it in a compact form (for images without alpha channel).
> When blitting sprite, brightness ranging from 0 - 255 is used as an additional
> argument. The code uses C++ templates to support all the possible variants
> of bit depth and blending type (may be not a very good idea for the code
> intended for submission into C library later :) ).
>
> The trick used to speed up alpha blending was to store each pixel data in
> a special 32-bit representation with R, G, B and alpha channel arranged in a
> special way for better performance.
>
> So imagine that we have 16-bit pixel in RGB565 format and alpha channel.
> We convert it into this 32-bit preprocessed data according to the following
> algorithm:
>
> uint32_t convert_pixel(uint16_t rgb565, int alpha)
> {
>     uint32_t n = (alpha + 1) / 8, x = rgb565;
>     x = (x | (x << 16)) & 0x7E0F81F;
>     return x | (n << 5);
> }
>
> Now if we need to do alpha blending (with some buffer in memory), we do the
> following (d - destination pixel data buffer, s - buffer with preprocessed
> 32-bit pixel data, w - number of pixels to blend, n - brightness level (0 -
> 32)):
>
> uint16_t *draw_alpha_dark_line16(uint16_t *d, uint32_t *s, int w, uint32_t n)
> {
>      while (--w >= 0) {
>         uint32_t x = *s++;
>         uint32_t y = (uint16_t)*d;
>         uint32_t result = (x >> 5) & 0x3F;
>         x = ((x & 0x7E0F81F) * n / 32) & 0x7E0F81F;
>         y = (y | (y << 16)) & 0x7E0F81F;
>         result = ((x - y) * result / 32 + y) & 0x7E0F81F;
>         *d++ = (result | (result >> 16));
>     }
>     return d;
> }
>
> This code works quite fast (at the cost of some precision loss though). It
> is perfectly suitable for isometric tile based games and probably other
> applications which only need lightning fast blending and do not need any
> extra operations with sprites (rotation for example). Removing brightness
> level support makes the code even faster.
>
> Using this code in ufo2000 allows it to keep reasonably high framerate (more
> than 10 fps) even on complicated scenes full of fire and smoke animation for
> example. I hope this information may be useful for other maemo game
> developers or anyone in need of fast 16bpp alpha blending code.
>
> PS. Optimizing alpha blending using assembly can most likely improve
> performance even more :)

We're doing almost this for Canola with the exception that we keep
data as 16+alpha in memory... we were considering moving to this same
32bit format you've said, since 3bytes is not aligned and make us do
the unpack it every time... at the expense of bit more memory.

We plan to release this blit function, but it's not much useful right
now... we should try make it available inside SDL_BlitSurface, but I
don't know if it's extensible.

-- 
Gustavo Sverzut Barbieri
--------------------------------------
Jabber: barbieri at gmail.com
   MSN: barbieri at gmail.com
  ICQ#: 17249123
 Skype: gsbarbieri
Mobile: +55 (81) 9927 0010
 Phone:  +1 (347) 624 6296; 08122692 at sip.stanaphone.com
   GPG: 0xB640E1A2 @ wwwkeys.pgp.net

More information about the maemo-developers mailing list