[maemo-developers] Performance of floating point instructions

From: Siarhei Siamashka siarhei.siamashka at gmail.com
Date: Wed Mar 10 22:33:08 EET 2010
On Wednesday 10 March 2010, Laurent Desnogues wrote:
> On Wed, Mar 10, 2010 at 8:54 PM, Siarhei Siamashka
> <siarhei.siamashka at gmail.com> wrote:
> [...]
>
> > I wonder why the compiler does not use real NEON instructions with
> > -ffast-math option, it should be quite useful even for scalar code.
> >
> > something like:
> >
> > vld1.32  {d0[0]}, [r0]
> > vadd.f32 d0, d0, d0
> > vst1.32  {d0[0]}, [r0]
> >
> > instead of:
> >
> > flds     s0, [r0]
> > fadds    s0, s0, s0
> > fsts     s0, [r0]
> >
> > for:
> >
> > *float_ptr = *float_ptr + *float_ptr;
> >
> > At least NEON is pipelined and should be a lot faster on more complex
> > code examples where it can actually benefit from pipelining. On x86, SSE2
> > is used quite nicely for floating point math.
>
> Even if fast-math is known to break some rules, it only
> breaks C rules IIRC. 

If that's the case, some other option would be handy. Or even a new custom
data type like float_neon (or any other name). Probably it is even possible
with C++ and operators overloading.

> OTOH, NEON FP has no support 
> for NaN and other nice things from IEEE754.
>
> Anyway you're perhaps looking for -mfpu=neon, no?

I lost my faith in gcc long ago :) So I'm not really looking for anything.

-- 
Best regards,
Siarhei Siamashka
More information about the maemo-developers mailing list