ansaurus

Question

What is the maximum theoretical speed-up due to SSE for a simple binary subtraction?

Answer 1

+3 A:

It depends on the CPU. But the theoretical max won't get above 4x. I don't know of a CPU which can execute more than one SSE instruction per clock cycle, which means that it can at most compute 4 values per cycle.

Most CPU's can do at least one floating point scalar instruction per cycle, so in this case you'd see a theoretical max of a 4x speedup.

But you'll have to look up the specific instruction throughput for the CPU you're running on.

A practical speedup of 3x is pretty good though.

jalf 2009-09-23 15:58:49

Answer 2

+2 A:

I think you'd probably have to interleave the inner loop somehow. The 3-component vector is getting done at once, but that's only 3 operations at once. To get to 4, you'd do 3 components from the first vector, and 1 from the next, then 2 and 2, and so on. If you established some kind of queue that loads and processes the data 4 components at a time, then separate it after, that might work.

Edit: You could unroll the inner loop to do 4 vectors per iteration (assuming the array size is always a multiple of 4). That would accomplish what I said above.

Jon Seigel 2009-09-23 15:59:34

Answer 3

A:

Consider: How wide is a float? How wide is the SSEx instruction? The ratio should should give you some kind of reasonable upper bound.

It's also worth noting that out-of-order pipes play havok with getting good estimates of speedup.

Paul Nathan 2009-09-23 16:00:07

Answer 4

A:

You should consider loop tiling - the way you are accessing values in the inner loop is probably causing a lot of thrashing in the L1 data cache. It's not too bad, because everything probably still fits in the L2 at 384 KB, but there is easily an order of magnitude difference between an L1 cache hit and an L2 cache hit, so this could make a big difference for you.

Jack Lloyd 2009-10-10 20:05:23

ansaurus

tags:

views:

answers:

What is the maximum theoretical speed-up due to SSE for a simple binary subtraction?

related questions