[maemo-developers] [maemo-developers] Optimized memory copying functions for Nokia770
From: Simon Pickering S.G.Pickering at bath.ac.ukDate: Thu Mar 23 12:24:28 EET 2006
- Previous message: [maemo-developers] Optimized memory copying functions for Nokia 770
- Next message: [maemo-developers] Optimized memory copying functions for Nokia770
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> > Now I just really badly want to see the benchmark results from some > > other cpu, preferably intel xscale :) > > Just got report from running my test on Sharp Zaurus SL-C760: > > --- running correctness tests --- > all the correctness tests passed > --- running performance tests (memory bandwidth benchmark) ---: > memset() memory bandwidth: 80.35MB/s > memset8() memory bandwidth: 83.55MB/s > memcpy() memory bandwidth (perfectly aligned): 45.29MB/s > memcpy16() memory bandwidth (perfectly aligned): 45.20MB/s > memcpy() memory bandwidth (16-bit aligned): 43.15MB/s > memcpy16() memory bandwidth (16-bit aligned): 38.27MB/s > --- testing performance for random blocks (size 0-15 bytes) > --- memset time: 0.960 > memset8 time: 0.880 > --- testing performance for random blocks (size 0-511 bytes) > --- memset time: 3.840 > memset8 time: 3.670 > > So memory copying functions on Zaurus are already optimal for > this Zaurus and my implementation only causes performance > degradation :) > I don't know what version flash image that c760 was running, but my c750 produces better results. I compiled your test program in two versions - arm4 and arm5 using the default toolchains built by bitbake + OpenEmbedded for the Zaurus sl-5500 and c7x0 machines respectively. Note that OpenZaurus uses significantly more up-to-date versions of libraries, etc. than the standard Sharp images, so this probably accounts for the difference in speed that you see below. If you want more info on the actual lib versions and patches then let me know. I ran these two binaries on my Zaurus sl-5500 and Zaurus sl-c750 (both running the latest OpenZaurus 3.5.4 flash image), and on my Nokia-770. The arm5 binary ran fine on the arm4 arch sl-5500, obviously there were no arm5 instructions included, but the times are slightly different - I don't know whether this is to do with background processes or something to do with the compiler (though I don't suppose the compiler really has much to do in this case.) Results as follows: ================================================================ ================================================================ Sharp Zaurus sl-C750 (c7x0/Shepherd) XScale-PXA255 rev 6 (v5l), 400MHz ================================================================ ================================================================ root at c7x0:/media/cf/other# ./arm5-fastmem-arm-test --- running correctness tests --- all the correctness tests passed --- running performance tests (memory bandwidth benchmark) ---: memset() memory bandwidth: 182.36MB/s memset8() memory bandwidth: 182.36MB/s memcpy() memory bandwidth (perfectly aligned): 80.04MB/s memcpy16() memory bandwidth (perfectly aligned): 34.49MB/s memcpy() memory bandwidth (16-bit aligned): 73.07MB/s memcpy16() memory bandwidth (16-bit aligned): 31.02MB/s --- testing performance for random blocks (size 0-15 bytes) --- memset time: 0.820 memset8 time: 0.750 --- testing performance for random blocks (size 0-511 bytes) --- memset time: 2.080 memset8 time: 2.060 ================================================================ root at c7x0:/media/cf/other# ./arm4-fastmem-arm-test --- running correctness tests --- all the correctness tests passed --- running performance tests (memory bandwidth benchmark) ---: memset() memory bandwidth: 183.96MB/s memset8() memory bandwidth: 182.36MB/s memcpy() memory bandwidth (perfectly aligned): 81.92MB/s memcpy16() memory bandwidth (perfectly aligned): 34.89MB/s memcpy() memory bandwidth (16-bit aligned): 74.63MB/s memcpy16() memory bandwidth (16-bit aligned): 31.35MB/s --- testing performance for random blocks (size 0-15 bytes) --- memset time: 0.790 memset8 time: 0.720 --- testing performance for random blocks (size 0-511 bytes) --- memset time: 2.060 memset8 time: 2.060 ================================================================ ================================================================ Sharp Zaurus sl-5500 (Collie) StrongARM-1110 rev 9 (v4l) 206MHz ================================================================ ================================================================ root at collie:/media/cf/other# ./arm5-fastmem-arm-test --- running correctness tests --- all the correctness tests passed --- running performance tests (memory bandwidth benchmark) ---: memset() memory bandwidth: 35.67MB/s memset8() memory bandwidth: 101.80MB/s memcpy() memory bandwidth (perfectly aligned): 59.07MB/s memcpy16() memory bandwidth (perfectly aligned): 59.24MB/s memcpy() memory bandwidth (16-bit aligned): 48.88MB/s memcpy16() memory bandwidth (16-bit aligned): 59.24MB/s --- testing performance for random blocks (size 0-15 bytes) --- memset time: 0.740 memset8 time: 0.540 --- testing performance for random blocks (size 0-511 bytes) --- memset time: 7.840 memset8 time: 3.090 ================================================================ root at collie:/media/cf/other# ./arm4-fastmem-arm-test --- running correctness tests --- all the correctness tests passed --- running performance tests (memory bandwidth benchmark) ---: memset() memory bandwidth: 35.67MB/s memset8() memory bandwidth: 101.80MB/s memcpy() memory bandwidth (perfectly aligned): 59.07MB/s memcpy16() memory bandwidth (perfectly aligned): 58.91MB/s memcpy() memory bandwidth (16-bit aligned): 49.00MB/s memcpy16() memory bandwidth (16-bit aligned): 59.07MB/s --- testing performance for random blocks (size 0-15 bytes) --- memset time: 0.730 memset8 time: 0.540 --- testing performance for random blocks (size 0-511 bytes) --- memset time: 7.850 memset8 time: 3.100 ================================================================ ================================================================ Nokia N770 ARM926EJ-Sid(wb) rev 3 (v5l) 200MHz OMAP1710 ? ================================================================ ================================================================ ./arm5-fastmem-arm-test --- running correctness tests --- all the correctness tests passed --- running performance tests (memory bandwidth benchmark) ---: memset() memory bandwidth: 117.16MB/s memset8() memory bandwidth: 262.14MB/s memcpy() memory bandwidth (perfectly aligned): 102.30MB/s memcpy16() memory bandwidth (perfectly aligned): 110.96MB/s memcpy() memory bandwidth (16-bit aligned): 69.21MB/s memcpy16() memory bandwidth (16-bit aligned): 99.39MB/s --- testing performance for random blocks (size 0-15 bytes) --- memset time: 0.400 memset8 time: 0.280 --- testing performance for random blocks (size 0-511 bytes) --- memset time: 2.430 memset8 time: 1.190 ================================================================ ./arm4-fastmem-arm-test --- running correctness tests --- all the correctness tests passed --- running performance tests (memory bandwidth benchmark) ---: memset() memory bandwidth: 119.16MB/s memset8() memory bandwidth: 265.46MB/s memcpy() memory bandwidth (perfectly aligned): 100.82MB/s memcpy16() memory bandwidth (perfectly aligned): 109.80MB/s memcpy() memory bandwidth (16-bit aligned): 68.53MB/s memcpy16() memory bandwidth (16-bit aligned): 98.46MB/s --- testing performance for random blocks (size 0-15 bytes) --- memset time: 0.400 memset8 time: 0.280 --- testing performance for random blocks (size 0-511 bytes) --- memset time: 2.430 memset8 time: 1.170 ================================================================ Cheers, Simon
- Previous message: [maemo-developers] Optimized memory copying functions for Nokia 770
- Next message: [maemo-developers] Optimized memory copying functions for Nokia770
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]