[maemo-developers] N800 & Video playback

Wed May 2 07:51:03 EEST 2007

On 4/30/07, Daniel Stone <daniel.stone at nokia.com> wrote:
>
> > There are two important optimizations in this code:
> > 1. Cache prefetch with PLD instruction (added in '_armv5' version) which
> > boosts performance to 70 megapixels per second. Inner loop is unrolled
> > to process 32 pixels per iteration (cache line size is 32 bytes on ARM, so
> > such unrolling is convenient). This is the most important improvement.
> > You can try using __builtin_prefetch() from C code to do the same
> > optimization.
>
> Ah, sounds useful.  From what Dan Amelang's been saying on xorg@, gcc
> should coalesce four 32-bit reads into one 128-bit read, but this sounds
> promising as well.

To expand on this: I was referring to fact that gcc is pretty smart
about using ldmia/stdmia instructions to cluster sequential
reads/writes. I see that Siarhei is already using this technique in
his assembler code, so nothing new here.

Dan