ansaurus

Question

Optimizing random-access bilinear sampling

Answer 1

+2 A:

Interesting problem.

Your problem definition has basically enforced unpredictable access to in[x,y] - since any vector might be provided. Assuming the vector image tends to refer to local pixels, the very first optimisation would be to ensure that you traverse the memory in a suitable order to make the most of cache locality. This might mean scanning 32*32 blocks in the "for each pixel" loop so that in[x,y] hits the same pixels as often as possible in a short time.

Most likely the performance of your algorithm is going to be bound by two things

How fast you can load vectors[x,y] and in[x,y] from main memory
How long it takes to do the multiplications and sums

There are SSE instructions that can multiply several elements together at a time and then add them together (multiply and accumulate). What you should do is compute

af = (1 - xf) * ( 1 - yf )
bf = (    xf) * ( 1 - yf )
cf = (1 - xf) * (     yf )
df = (    xf) * (     yf )

and then compute

a *= af
b *= bf 
c *= cf
d *= cf
return (a + b + c + d)

There is a good chance that both of these steps could be accomplished with a surprisingly small number of SSE instructions (depending on your pixel representation).

I think that caching intermediate values is very unlikely to be useful - it seems extremely unlikely that >1% of the vector requests will point to exactly the same place, and caching stuff will cost you much more in memory bandwidth than it will save.

If you use the prefetch instructions on your cpu to prefetch in[vectors[x+1, y]] as you process vectors[x,y] you might improve memory performance, there is no way the CPU will be able to predict your random walk around memory otherwise.

The final way to improve the performance of your algorithm is to handle chunks of input pixels at once, i.e x[0..4], x[5..8] - this lets you unroll the inner maths loops. However, you are so likely to be memory bound that this won't help.

Tom Leys 2009-11-23 03:27:04

ansaurus

tags:

views:

answers:

Optimizing random-access bilinear sampling

related questions