[maemo-developers] Performance of floating point instructions

Wed Mar 10 22:31:54 EET 2010

On Wed, 2010-03-10 at 21:54 +0200, Siarhei Siamashka wrote:
> I wonder why the compiler does not use real NEON instructions with -ffast-math 
> option, it should be quite useful even for scalar code.
> 
> something like:
> 
> vld1.32  {d0[0]}, [r0]
> vadd.f32 d0, d0, d0
> vst1.32  {d0[0]}, [r0]
> 
> instead of:
> 
> flds     s0, [r0]
> fadds    s0, s0, s0
> fsts     s0, [r0]
> 
> for:
> 
> *float_ptr = *float_ptr + *float_ptr;
> 
> At least NEON is pipelined and should be a lot faster on more complex code
> examples where it can actually benefit from pipelining. On x86, SSE2 is used
> quite nicely for floating point math.

Hi,

Please open a report on http://gcc.gnu.org/bugzilla with your test
sources and command line, at least GCC developpers will notice there's
interest :).

GCC comes with some builtins for neon, they're defined in arm_neon.h
see below.

Sincerely,

Laurent

typedef struct float32x2x2_t
{
  float32x2_t val[2];
} float32x2x2_t;

...

__extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
vpadd_f32 (float32x2_t __a, float32x2_t __b)
{
  return (float32x2_t)__builtin_neon_vpaddv2sf (__a, __b, 3);
}