neon

MMX instructions for Iphone

Hi Does iphone processor ARMV6 supports MMX instructions? ...

iPhone detecting processor model / NEON support

Hello, I'm looking for a way to differentiate at runtime between devices equipped with the new ARM processor (such as iPhone 3GS and some iPods 3G) and devices equipped with the old ARM processors. I know I can use uname() to determine the device model, but as only some of the iPod touches 3G received a boost in their ARM processor, thi...

Fast 4x4 Matrix Multiplication in C

I am trying to find an optimized C or Assembler implementation of a function that multiplies two 4x4 matrices with each other. The platform is an ARM6 or ARM7 based iPhone or iPod. Currently, I am using a fairly standard approach - just a little loop-unrolled. #define O(y,x) (y + (x<<2)) static inline void Matrix4x4MultiplyBy4x4 (flo...

Fast sine/cosine for ARMv7+NEON: looking for testers…

Could somebody with access to an iPhone 3GS or a Pandora please test the following assembly routine I just wrote? It is supposed to compute sines and cosines really really fast on the NEON vector FPU. I know it compiles fine, but without adequate hardware I can't test it. If you could just compute a few sines and cosines and compare the...

Common SIMD techniques

Hi! Where can I find information about common SIMD tricks? I have an instruction set and know, how to write non-tricky SIMD code, but I know, SIMD now is much more powerful. It can hold complex conditional branchless code. For example (ARMv6), the following sequence of instructions sets each byte of Rd equal to the unsigned minimum of t...

How to enable Neon instruction in Xcode

Hi I want to use Neon SIMD instruction for the iphone. I heard we have to put flags "-mfloat-abi=softfp -mfpu=neon" in the "Other C Flags" field of the Target inspector, but when building I get "error: unrecognized command line option "-mfpu=neon"" . Is there anything else special that has to be done to allow this flag? (I have Xcode...

How do I reorder vector data using ARM Neon intrinsics?

This is specifically related to ARM Neon SIMD coding. I am using ARM Neon instrinsics for certain module in a video decoder. I have a vectorized data as follows: There are four 32 bit elements in a Neon register - say, Q0 - which is of size 128 bit. 3B 3A 1B 1A There are another four, 32 bit elements in other Neon register say Q1 ...

Neon toolkit and Gate Web Service

I am trying to run any of the services from gate web service, in neon 2.3. Even Annie that runs so well in gate doesn't run, or better, it stay for indefinite time processing, a thing that should take no more than a couple of seconds. I run wizard, set input directory, leave file pattern as default and set a folder and name for the outp...

How to initialize const float32x4x4_t (ARM NEON intrinsic, GCC) ?

I can initialize float32x4_t like this: const float32x4x4_t zero = { 0.0f, 0.0f, 0.0f, 0.0f }; But this code makes an error Incompatible types in initializer: const float32x4x4_t one = { 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, 1.0f, }; float32x4x4_t is 4x4 matrix bui...

Is there a good reference for ARM Neon intrinsics?

The ARM reference manual doesn't go into too much detail into the individual instructions ( http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0348b/BABIIBBG.html ). Is there something that's a little more detailed? ...

No xor gcc intrinsics for ARM NEON

Hi, I could not find any intrinsics for a simple xor operation. See: http://gcc.gnu.org/onlinedocs/gcc/ARM-NEON-Intrinsics.html Are there really no way to use NEON instructions for this? ...

How to use the multiply and accumulate intrinsics in ARM Cortex-a8?

Hi Guys, how to use the Multiply-Accumulate intrinsics provided by GCC? float32x4_t vmlaq_f32 (float32x4_t , float32x4_t , float32x4_t); Can anyone explain what three parameters I have to pass to this function. I mean the Source and destination registers and what the function returns? Help!!! ...

Compiler errors while building a project which uses Eigen, the C++ template library for linear algebra...

Hi Guys, in my project I'm making use of Eigen C++ library for linear algebra and ONLY when I turn on the vectorization flags (mfpu=neon -mfloat-abi=softfp) for ARM NEON, I get compiler errors. I'm not able to understand whats going wrong. Do I need to enable any preprocessor directives for ARM NEON in the Eigen Library? main.c #inc...

CodeSourcery giving compilation error: missing bits/c++config.h

Hi Guys, in my project I'm making use of Eigen C++ library for linear algebra. ONLY when I turn on the vectorization flags (-mfpu=neon -mfloat-abi=softfp) for ARM NEON, I get a compiler error - c++config.h no such file or directory. I'm not able to understand whats going wrong, what is this bits/c++config.h? What should I do to fix thi...

Is 3x3 Matrix inverse possible using SIMD instructions?

Hi Guys, I'm making use of an ARM Cortex-A8 based processor and I have several places where I calculate 3x3 Matrix inverse operations. As the Cortex-a8 processor has a NEON SIMD processor I'm interested to use this co-processor for 3x3 matrix inverse, I saw several 4x4 implementations (Intel SSE and freevec) but no where did I see a 3x...

How to merge elements of 2 rows using NEON SIMD?

I have a A = a1 a2 a3 a4 b1 b2 b3 b4 c1 c2 c3 c4 d1 d2 d3 d4 I have 2 rows with me, float32x2_t a = a1 a2 float32x2_t b = b1 b2 From these how can I get a - float32x4_t result = b1 a1 b2 a2 Is there any single NEON SIMD instruction which can merge these two rows? Or how can I achieve this using as minimum steps as p...

ARM NEON: How to load 8bit uint8_t as uint32_t?

Hi, my image processing project works with grayscale images. I have ARM Cortex-A8 processor platform. I want to make use of the NEON. I have a grayscale image( consider the example below) and in my alogorithm, I have to add only the columns. How can I load four 8-bit pixel values in parallel, which are uint8_t, as four uint32_t into...

Failure to build NDK cygwin Windows 7 x64

I currently have a JNI application on the market and was looking to use the neon instructions to further improve performance. I am able to compile and build the neon sample and some basic changes but whenever I choose to do something more complex the assembler crashes with the following output (i've stripped the file names): .0/ProjectT...

How to use NEON comparison (greater than or equal to) instruction?

How to use the NEON comparison instructions in general? Here is a case, I want to use, Greater-than-or-equal-to instruction? Currently I have a, int x; ... ... ... if(x >= 0) { .... } In NEON, I would like to use x in the same way, just that x this time is a vector. int32x4_t x; ... ... ... if(vcgeq_s32(x, vdupq_n_s32(0))) // Wh...

Unknown GCC error, while compiling for ARM NEON (Critical)

Hi Guys, I have a ARM NEON Cortex-A8 based processor target. I was optimizing my code by making use of NEON. But when I compile my code I get this strange error. Don't know how to fix this. I'm trying to compile the following code (PART 1) using Code Sourcery (PART2) on my host. And I get this strange error (PART3). Am I doing somethi...