ansaurus

Question

How much effort do you have to put in to get gains from using SSE?

Answer 1

+1 A:

it is valuable if your is case is that you do a lot of same calculations on range of data. for example you calculate square roots of many-many equations. you can load 4 values in sse registers and call operations once. this will increase performance by 4.

and there are libraries that have all sse optimization inside them. don't reinvent bicycle.

Andrey 2010-04-12 16:25:33

Answer 2

+3 A:

In general you will need to take additional steps to get the best out of SSE (or any other SIMD architecture):

data needs to be 16 byte aligned (ideally)
data needs to be contiguous
you need enough data to make the SIMD operation worthwhile
you need to coalesce as many operations as you can to mitigate the costs of loads/stores
you need to be aware of the cache/memory hierarchy and its performance impact (e.g. use strip-mining/tiling)

Paul R 2010-04-12 16:39:44

If we align data structures, then we don't need to load values to registers? Or we still do, and this just speeds that part up?

John 2010-04-12 17:06:36

Your data needs to be 16-byte aligned in order to get the most efficient loads/stores between memory and SSE registers - SSE does support misaligned loads/stores but there is a significant performance penalty for using these on anything other than Core i7.

Paul R 2010-04-13 06:23:26

Answer 3

A:

I tried Case One at work a couple of years ago and the performance gain was barely measurable. In the end I decided to skip it since all the hassle with aligning all Point3D on 16 byte boundaries made it not worthwhile.

As you've correctly guessed SSE is most suited to bulk operations where they can give a pretty good speed up. Before you go ahead and use the SSE intrinsics check what code the compiler is already generating. I know from experience that for instance Visual Studio is pretty good at using SSE-optimizations.

Andreas Brinck 2010-04-12 16:53:32

If you want help from the compiler then Intel's ICC will do a lot more auto-vectorization than Visual Studio.

Paul R 2010-04-13 06:24:09

Answer 4

A:

This Gamasutra article shows what it takes to make fast SSE-based code. It covers your "Case 1" in detail.

The source code is available from the author's homepage.

nsanders 2010-07-29 09:15:35

ansaurus

tags:

views:

answers:

How much effort do you have to put in to get gains from using SSE?

Case One

Case Two

In conclusion

related questions