[maemo-developers] [maemo-developers] Optimized memory copying functions for Nokia770
From: Siarhei Siamashka siarhei.siamashka at gmail.comDate: Fri Mar 24 18:24:37 EET 2006
- Previous message: [maemo-developers] Optimized memory copying functions for Nokia770
- Next message: [maemo-developers] Optimized memory copying functions for Nokia 770
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Simon Pickering wrote: > I don't know what version flash image that c760 was running, but my c750 > produces better results. I compiled your test program in two versions - arm4 > and arm5 using the default toolchains built by bitbake + OpenEmbedded for > the Zaurus sl-5500 and c7x0 machines respectively. I'll ask for this information. > Note that OpenZaurus uses significantly more up-to-date versions of > libraries, etc. than the standard Sharp images, so this probably accounts > for the difference in speed that you see below. If you want more info on the > actual lib versions and patches then let me know. > > I ran these two binaries on my Zaurus sl-5500 and Zaurus sl-c750 (both > running the latest OpenZaurus 3.5.4 flash image), and on my Nokia-770. The > arm5 binary ran fine on the arm4 arch sl-5500, obviously there were no arm5 > instructions included, but the times are slightly different - I don't know > whether this is to do with background processes or something to do with the > compiler (though I don't suppose the compiler really has much to do in this > case.) > > Results as follows: > > [results skipped] > Compiling for arm4 or arm5 should not make any difference as all the code that is benchmarked is not generated by compiler anyway (standard functions are in standard libraries, tested functions are implemented as inline assembler). Slight times differences should be ignored and are just random deviations (+-1MB/s does not change overall picture much). From these results looks like that: Optimized memset works good on all tested platforms, providing the same or much better results. Memset performance is critical for clearing bitmaps to some color and drawing rectangles, so optimizing it makes sense. The same code for memcpy works good for Nokia 770 and StrongARM, but XScale needs different optimizations. Seems like reading memory is important here, maybe prefetch (PLD instruction) could improve performance. I tried using prefetch when writing optimized memcpy for Nokia 770, but it did not have any effect at all. But prefetch requires armv5, so it affects portability. An interesting observation is standard memset vs. memcpy performance on StrongARM. In spite of doing more work, memcpy is even faster :) It would be interesting to make some tests on PXA270 too.
- Previous message: [maemo-developers] Optimized memory copying functions for Nokia770
- Next message: [maemo-developers] Optimized memory copying functions for Nokia 770
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]