[maemo-developers] Toolchain upgrade? (Was: Instructions cache flush on ARM)
From: Siarhei Siamashka siarhei.siamashka at gmail.comDate: Wed May 2 23:00:55 EEST 2007
- Previous message: Toolchain upgrade? (Was: Instructions cache flush on ARM)
- Next message: Toolchain upgrade? (Was: Instructions cache flush on ARM)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wednesday 02 May 2007 18:48, Eero Tamminen wrote: > On x86 I prefer valgrind/cachegrind/callgrind/kcachegrind as > that way one can browse the source code interactively with > the profiling information. Getting to know how the source > really works is sometimes more useful than knowing the exact > bottleneckedness percentage of some function. Sure, I'm also using valgrind/cachegrind/callgrind/kcachegrind in my work quite often. It's a very nice tool. But callgrind for statistics does not provide information about floating point math and integer divisions, so real results on ARM may be really very different. Also cache behaviour on Nokia 770 arm926ej-s core is very different from cache on x86. Actually arm926ej-s does not allocate cache line on write miss and all the x86 cpus do. This makes very big difference for the code which does lots of writes to uncached memory. Cachegrind only simulates write-allocate cache. I created the following patch for simulating read-allocate behaviour in callgrind (for more precise arm926ej-s simulation): http://ufo2000.sourceforge.net/files/vg-read-allocate-cache-patch.diff Though arm1136jf-s core from N800 now supports write-allocate cache and this patch is not needed when optimizing for N800 :) > > Did anybody try installing newer toolchains in scratchbox and use them > > with maemo SDK? I just don't have much free time for these experiments > > and don't want to break my installation of scratchbox which works now > > (more or less acceptable) > > Installing new toolchains for Sbox shouldn't be a problem (if it's > already available for it) and you can make a new Sbox target for each > toolchain you want to test. Thanks, I'll try that. In my preliminary tests, mplayer becomes a few percents faster for mpeg4 decoding when switching to gcc 4.1.1 (tested a build compiled with a crosscompiler outside scratchbox, with no audio/video output except for SDL, so not really useful for end users, but fine for benchmarking with gprof). > > Building packages with new toolchain would probably need to have > > libstdc++ linked statically for C++ applications to work on 770/N800, but > > otherwise everything should be fine. > > Actually, you cannot really build static binaries with Glibc. > It links some stuff always dynamically (nss for example). > I don't know whether this is a problem in practice though. I'm not going to statically link with glibc, but only with libstdc++ (standard c++ library). There are a few known tricks to make gcc link with libstdc++ statically, but dynamically with all the rest of libraries. One of them is creating a symlink to libstdc++.a in some empty directory and specify this directory with -L option in gcc command line. When gcc will start linking, it will be fooled to link with a static libstdc++ library. But I guess just killing libstdc++.so in scratchbox will do the the job. After that, the compiler theoretically should create binaries which should run with no problems on the device even for c++ applications. > > http://arm.com/documentation/ARMProcessor_Cores/index.html > > 'ARM1136JF-S and ARM1136J-S r1p1 Technical Reference Manual' > > Chapter 4 'Unaligned and Mixed-Endian Data Access Support' > > Did you read the section on "ARMv6 unaligned data access restrictions"? > Basically it doesn't work in all cases, the accesses are not atomic and > have performance implications. Did you also read Intel docs? Unaligned access has some restrictions on x86 as well. Do you have an example of some practical case where hardware unaligned support from ARM11 would work worse than on x86? The compiler should do the job aligning data for performance reasons (as it does on x86 as well). But if you happen to have some unaligned data in memory anyway, just reading it with some minor unavoidable performance penalty will be faster than reading data one byte at a time and combining it into a 32-bit or 16-bit value (instructions timings can be also found in this Technical Reference Manual). Enabling hardware unaligned access support should make explicit pointer conversion hacks that are sometimes used in not very portable C code work just like they do on x86. Which is a good thing in my opinion. > > As ARM11 core used in N800 is little endian, does have floating point > > unit and supports unaligned memory access in hardware (which only needs > > to be enabled). It probably doesn't have any serious portability issues > > to be aware of anymore and vast majority of software initially developed > > for x86 should be easy to compile and run on it even without doing any > > modifications. > > Compiler aligns everything correctly if your code is correct. > I think non-aligned code is bug and performance issue. In the real world such buggy code unfortunately exists. And it works fine on x86 which is probably the most widely used platform for software development. > > Enabling unaligned memory support will make life much easier for > > developers unfamiliar with ARM platform. The number of applications for > > N800 should grow up, as less newbee developers will be turned away > > frustrated by the alignment bugs they have never heared about before. > > Can you give examples of issues people have with this? Don't know if it is a good example, but it took me some time to figure out what the hell is going on when I only started trying to port code to Nokia 770 :) I did not start flooding the mailing list or forums with requests for help but found a solution myself after some time of searching for information. I did know about endian issues on macs before, but it was completely unknown to me that there are architectures which have strict alignment requirement (except for SSE instructions on x86, which seemed like rather exception than a rule to me). Maybe some people also found a solution on their own, but maybe some have just given up without even complaining and we lost some developers. The number of posts asking for help is also nonzero: http://www.internettablettalk.com/forums/showthread.php?t=2668 In addition, I remember having explained about alignment issues to a few people on #maemo channel over all this time, they all came complaining about applications working on x86 but crashing on ARM. So in my opinion this problem really exists, even if it is not so significant.
- Previous message: Toolchain upgrade? (Was: Instructions cache flush on ARM)
- Next message: Toolchain upgrade? (Was: Instructions cache flush on ARM)
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]