[maemo-developers] Performance of floating point instructions
From: Laurent GUERBY laurent at guerby.netDate: Wed Mar 10 22:31:54 EET 2010
- Previous message: Performance of floating point instructions
- Next message: Performance of floating point instructions
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Wed, 2010-03-10 at 21:54 +0200, Siarhei Siamashka wrote: > I wonder why the compiler does not use real NEON instructions with -ffast-math > option, it should be quite useful even for scalar code. > > something like: > > vld1.32 {d0[0]}, [r0] > vadd.f32 d0, d0, d0 > vst1.32 {d0[0]}, [r0] > > instead of: > > flds s0, [r0] > fadds s0, s0, s0 > fsts s0, [r0] > > for: > > *float_ptr = *float_ptr + *float_ptr; > > At least NEON is pipelined and should be a lot faster on more complex code > examples where it can actually benefit from pipelining. On x86, SSE2 is used > quite nicely for floating point math. Hi, Please open a report on http://gcc.gnu.org/bugzilla with your test sources and command line, at least GCC developpers will notice there's interest :). GCC comes with some builtins for neon, they're defined in arm_neon.h see below. Sincerely, Laurent typedef struct float32x2x2_t { float32x2_t val[2]; } float32x2x2_t; ... __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) vpadd_f32 (float32x2_t __a, float32x2_t __b) { return (float32x2_t)__builtin_neon_vpaddv2sf (__a, __b, 3); }
- Previous message: Performance of floating point instructions
- Next message: Performance of floating point instructions
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]