simd

How much speed-up from converting 3D maths to SSE or other SIMD?

I am using 3D maths in my application extensively. How much speed-up can I achieve by converting my vector/matrix library to SSE, AltiVec or a similar SIMD code? ...

How to get GCC to use more than two SIMD registers when using intrinsics?

I am writing some code and trying to speed it up using SIMD intrinsics SSE2/3. My code is of such nature that I need to load some data into an XMM register and act on it many times. When I'm looking at the assembler code generated, it seems that GCC keeps flushing the data back to the memory, in order to reload something else in XMM0 an...

How to vectorize with gcc?

The v4 series of the gcc compiler can automatically vectorize loops using the SIMD processor some modern CPUs, such as the AMD Athlon or Intel Pentium/Core chips. How is this done? ...

What compilers besides gcc can vectorize code?

GCC can vectorize loops automatically when certain options are specified and given the right conditions. Are there other compilers widely available that can do the same? ...

Practical use of automatic vectorization?

Has anyone taken advantage of the automatic vectorization that gcc can do? In the real world (as opposed to example code)? Does it take restructuring of existing code to take advantage? Are there a significant number of cases in any production code that can be vectorized this way? ...

SIMD on an Array of Doubles?

I'm doing some work where SIMD is required and I need to do operations on an array of doubles. Do any of the mainstream architectures support this? I've only seen floating point operations. Thanks in Advance, Stefan ...

Good portable SIMD library

Hi, can anyone recommend portable SIMD library that provides a c/c++ API, works on Intel and AMD extensions and Visual Studio, GCC compatible. I'm looking to speed up things like scaling a 512x512 array of doubles. Vector dot products, matrix multiplication etc. So far the only one I found is: http://simdx86.sourceforge.net/ but as t...

float to integer conversion using iPhones SIMD float unit

I am currently trying to optimize some DSP related code with Shark and found that I am wasting a lot of time in a float to integer conversion: SInt16 nextInt = nextFloat * 32768.0f + 0.5f; As the iPhone seems to have an ARM11 FP co-processor, I am wondering if I can replace my code with the FTOSI instruction. There is some documentati...

Is it possible to execute MIMD with OpenCL framework?

Hello: Soon enough we will have nVidia GTX 300 that would be able to execute multiple instrucions on multiple data (MIMD). I wonder if OpenCL can execute MIMD? ...

Fast Saturate and shift two Halfwords in ARM asm

I have two signed 16-bit values in a 32-bit word, and I need to shift them right (divide) on constant value (it can be from 1 to 6) and saturate to byte (0..0xFF). For example, 0x FFE1 00AA with shift=5 must become 0x 0000 0005; 0x 2345 1234 must become 0x 00FF 0091 I'm trying to saturate the values simultaneously, something like th...

Call a function lower in the script from a function higher in the script

Hello, I'm trying to come up with a way to make the computer do some work for me. I'm using SIMD (SSE2 & SSE3) to calculate the cross product, and I was wondering if it could go any faster. Currently I have the following: const int maskShuffleCross1 = _MM_SHUFFLE(3,0,2,1); // y z x const int maskShuffleCross2 = _MM_SHUFFLE(3,1,0,2); //...

Resources for (Manual and Automatic) Loop Vectorization

I see some resources for gcc, but not for Visual Studio. Anyone have a treasure trove of references, examples and tricks? ...

Getting started with SSE

Hello, I want to learn more about using the SSE. What ways are there to learn, besides the obvious reading the Intel® 64 and IA-32 Architectures Software Developer's Manuals ? Mainly I'm interested to work with the GCC X86 Built-in Functions. ...

SIMD programming languages

In the last couple of years, I've been doing a lot of SIMD programming and most of the time I've been relying on compiler intrinsic functions (such as the ones for SSE programming) or on programming assembly to get to the really nifty stuff. However, up until now I've hardly been able to find any programming language with built-in suppor...

SSE and hyper threading

Are SSE registers shared or duplicated between logical processors (hyper threading) ? Can I expect the same kind of speedup from parallelization for a SSE heavy program as for a normal program (Intel claims 30% for processors with hyper threading)? ...

Fast in-register sort of bytes?

Given a register of 4 bytes (or 16 for SIMD), there has to be an efficient way to sort the bytes in-register with a few instructions. Thanks in advance. ...

Mono.Simd Vector3 (floats) missing?

Heya, I'm trying to use Mono's SIMD to handle coordinates(X,Y,Z) in my project, but I only see support for Vector2 and Vector4 types. Has anyone run into this before, and are there any workarounds? Thanks in advance. ...

Loop versioning with GCC

I am working on auto vectorization with GCC. I am not in a position to use intrinsics or attributes due to customer requirement. (I cannot get user input to support vectorization) If the alignment information of the array that can be vectorized is unknown, GCC invokes a pass for 'loop versioning'. Loop versioning will be performed when ...

How much more likely are hash collisions if I hash a bunch of hashes?

Say I'm using a hash to identify files, so I don't need it to be secure, I just need to minimize collisions. I was thinking that I could speed the hash up by running four hashes in parallel using SIMD and then hashing the final result. If the hash is designed to take a 512-bit block, I just step through the file taking 4x512 bit blocks a...

SSE2: How to reduce a _m128 to a word

Hello What's the best way ( sse2 ) to reduce a _m128 ( 4 words a b c d) to one word? I want the low part of each _m128 components: int result = ( _m128.a & 0x000000ff ) << 24 | ( _m128.b & 0x000000ff ) << 16 | ( _m128.c & 0x000000ff ) << 8 | ( _m128.d & 0x000000ff ) << 0 Is there an intrinsics for that ? than...