[maemo-developers] Performance of floating point instructions
From: Siarhei Siamashka siarhei.siamashka at gmail.comDate: Wed Mar 10 21:54:54 EET 2010
- Previous message: Performance of floating point instructions
- Next message: Performance of floating point instructions
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wednesday 10 March 2010, Laurent Desnogues wrote: > On Wed, Mar 10, 2010 at 7:29 PM, Alberto Mardegan > > So, it seems that there's a huge improvements when switching from doubles > > to floats; although I wonder if it's because of the FPU or just because > > the amount of data passed around is smaller. > > On the other hand, the improvements obtained by enabling the fast FPU > > mode is rather small -- but that might be due to the fact that the FPU > > operations are not a major player in this piece of code. > > The "fast" mode only gains 1 or 2 cycles per FP instruction. > The FPU on Cortex-A8 is not pipelined and the fast mode > can't change that :-) It's probably http://infocenter.arm.com/help/topic/com.arm.doc.ddi0344j/ch16s07s01.html vs. http://infocenter.arm.com/help/topic/com.arm.doc.ddi0344j/BCGEIHDJ.html I wonder why the compiler does not use real NEON instructions with -ffast-math option, it should be quite useful even for scalar code. something like: vld1.32 {d0[0]}, [r0] vadd.f32 d0, d0, d0 vst1.32 {d0[0]}, [r0] instead of: flds s0, [r0] fadds s0, s0, s0 fsts s0, [r0] for: *float_ptr = *float_ptr + *float_ptr; At least NEON is pipelined and should be a lot faster on more complex code examples where it can actually benefit from pipelining. On x86, SSE2 is used quite nicely for floating point math. -- Best regards, Siarhei Siamashka
- Previous message: Performance of floating point instructions
- Next message: Performance of floating point instructions
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]