[maemo-developers] [maemo-developers] Optimized memory copying functions for Nokia770

From: Siarhei Siamashka siarhei.siamashka at gmail.com
Date: Fri Mar 24 18:24:37 EET 2006
Simon Pickering wrote:

> I don't know what version flash image that c760 was running, but my c750
> produces better results. I compiled your test program in two versions - arm4
> and arm5 using the default toolchains built by bitbake + OpenEmbedded for
> the Zaurus sl-5500 and c7x0 machines respectively.

I'll ask for this information.

> Note that OpenZaurus uses significantly more up-to-date versions of
> libraries, etc. than the standard Sharp images, so this probably accounts
> for the difference in speed that you see below. If you want more info on the
> actual lib versions and patches then let me know.
> 
> I ran these two binaries on my Zaurus sl-5500 and Zaurus sl-c750 (both
> running the latest OpenZaurus 3.5.4 flash image), and on my Nokia-770. The
> arm5 binary ran fine on the arm4 arch sl-5500, obviously there were no arm5
> instructions included, but the times are slightly different - I don't know
> whether this is to do with background processes or something to do with the
> compiler (though I don't suppose the compiler really has much to do in this
> case.)
> 
> Results as follows:
>
> [results skipped]
>

Compiling for arm4 or arm5 should not make any difference as all the
code that is benchmarked is not generated by compiler anyway (standard
functions are in standard libraries, tested functions are implemented as
inline assembler). Slight times differences should be ignored and are
just random deviations (+-1MB/s does not change overall picture much).

 From these results looks like that:

Optimized memset works good on all tested platforms, providing the
same or much better results. Memset performance is critical for clearing
bitmaps to some color and drawing rectangles, so optimizing it makes
sense.

The same code for memcpy works good for Nokia 770 and StrongARM, but
XScale needs different optimizations. Seems like reading memory is
important here, maybe prefetch (PLD instruction) could improve
performance. I tried using prefetch when writing optimized memcpy for
Nokia 770, but it did not have any effect at all. But prefetch requires
armv5, so it affects portability.

An interesting observation is standard memset vs. memcpy performance on
StrongARM. In spite of doing more work, memcpy is even faster :)

It would be interesting to make some tests on PXA270 too.


More information about the maemo-developers mailing list