[maemo-developers] [maemo-developers] Optimized memory copying functions for Nokia 770
From: Siarhei Siamashka siarhei.siamashka at gmail.comDate: Tue Mar 14 19:49:53 EET 2006
- Previous message: [maemo-developers] Optimized memory copying functions for Nokia 770
- Next message: [maemo-developers] Optimized memory copying functions for Nokia 770
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Eero Tamminen wrote: >> That makes the comparison with memcpy somewhat unfair, since you >> are not actually providing replacement functions, so this would >> only make difference for -O3 type optimatisation (where you trade >> speed for size); it would be interesting to see what the >> performance difference is if you add the C prologue and epilogue.# > > One should also remember that inlining functions increases the code > size. On trivial sized test programs this is not an issue, but in > real programs it is, especially with the RAM and cache sizes that ARM > has. Sometimes inlining makes sense, sometimes it does not. In my case (blitting code for allegro game programming library) it does, just quoting myself: > Also just improving glibc might not give the best results. Imagine a > code for 16bpp bitmaps blitting. It contains a tight loop of copying > pixels one line at a time. If we need to get the best performance > possible, especially for small bitmaps with only a few horizontal > pixels, extra overhead caused by a memcpy function call and also > extra check for alignment (which is known to be 16-bit in this case) > might make a noticeable difference. So directly inlining code from > that 'memcpy16' macro will be better in this case. By the way, I tried to search for asm optimized versions of memcpy for ARM platforms. Did not do that before as my mistake was that I assumed glibc memcpy/memset implementations to be already optimized as much as posible. Appears that there is fast memcpy implementation in uclibc and there are also much more other implementations around. Seems like I tried to reinvent the wheel. Too bad if it appears that spending the whole 2 days on weekend was a useless waste of time :( Well, at least I did not try to steal someone's else code and 'copyright' it. As I told before, my observations show that it is better to align writes on 16-byte boundaries at least on Nokia 770. The code I have posted is a proof of concept code and it shows that it is faster than default memset/memcpy on the device. I'm going to compare my code with uclibc implementation, if uclibc is in fact faster or has the same performance, I'll have to apologize for causing this mess and go away ashamed. In any case, performance of memcpy/memset on default Nokia 770 image is far from optimal. And considering that the device is certainly not overpowered, improvements in this area might probably help. Just checked GTK sources, memcpy is used in a lot of places, don't know whether it affects performance much though. Is it something worth investigating by Nokia developers?
- Previous message: [maemo-developers] Optimized memory copying functions for Nokia 770
- Next message: [maemo-developers] Optimized memory copying functions for Nokia 770
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]