views:

149

answers:

7

I am processing a series of points which all have the same Y value, but different X values. I go through the points by incrementing X by one. For example, I might have Y = 50 and X is the integers from -30 to 30. Part of my algorithm involves finding the distance to the origin from each point and then doing further processing.

After profiling, I've found that the sqrt call in the distance calculation is taking a significant amount of my time. Is there an iterative way to calculate the distance?

In other words:

I want to efficiently calculate: r[n] = sqrt(x[n]*x[n] + y*y)). I can save information from the previous iteration. Each iteration changes by incrementing x, so x[n] = x[n-1] + 1. I can not use sqrt or trig functions because they are too slow except at the beginning of each scanline.

I can use approximations as long as they are good enough (less than 0.l% error) and the errors introduced are smooth (I can't bin to a pre-calculated table of approximations).

Additional information: x and y are always integers between -150 and 150

I'm going to try a couple ideas out tomorrow and mark the best answer based on which is fastest.

Results

I did some timings

  • Distance formula: 16 ms / iteration
  • Pete's interperlating solution: 8 ms / iteration
  • wrang-wrang pre-calculation solution: 8ms / iteration

I was hoping the test would decide between the two, because I like both answers. I'm going to go with Pete's because it uses less memory.

+2  A: 
Y=10000
Y2=Y*Y
for x=0..Y2 do
  D[x]=sqrt(Y2+x*x)

norm(x,y)=
  if (y==0) x
  else if (x>y) norm(y,x) 
  else {
     s=Y/y
     D[round(x*s)]/s
  }

If your coordinates are smooth, then the idea can be extended with linear interpolation. For more precision, increase Y.

The idea is that s*(x,y) is on the line y=Y, which you've precomputed distances for. Get the distance, then divide it by s.

I assume you really do need the distance and not its square.

You may also be able to find a general sqrt implementation that sacrifices some accuracy for speed, but I have a hard time imagining that beating what the FPU can do.

By linear interpolation, I mean to change D[round(x)] to:

f=floor(x)
a=x-f
D[f]*(1-a)+D[f+1]*a
wrang-wrang
+1  A: 

This doesn't really answer your question, but may help...

The first questions I would ask would be:

  • "do I need the sqrt at all?".
  • "If not, how can I reduce the number of sqrts?"
  • then yours: "Can I replace the remaining sqrts with a clever calculation?"

So I'd start with:

  • Do you need the exact radius, or would radius-squared be acceptable? There are fast approximatiosn to sqrt, but probably not accurate enough for your spec.
  • Can you process the image using mirrored quadrants or eighths? By processing all pixels at the same radius value in a batch, you can reduce the number of calculations by 8x.
  • Can you precalculate the radius values? You only need a table that is a quarter (or possibly an eighth) of the size of the image you are processing, and the table would only need to be precalculated once and then re-used for many runs of the algorithm.

So clever maths may not be the fastest solution.

Jason Williams
A: 

Since sqrt equals the number ^(1/2), would this be better?

r[n] = (x[n]*x[n] + y*y))^(1/2)
Phoexo
That's actually more work, unless the compiler is smart enough to turn that back into a sqrt(). Square roots are easier to calculate than the general case of the exponent.
jprete
A: 

Well, you can mirror around x=0 to start with (you need only compute n>=0, and the dupe those results to corresponding n<0). After that, I'd take a look at using the derivative on sqrt(a^2+b^2) (or the corresponding sin) to take advantage of the constant dx.

If that's not accurate enough, may I point out that this is a pretty good job for SIMD, which will provide you with a reciprocal square root op on both SSE and VMX (and shader model 2).

Crashworks
+1  A: 

Well there's always trying optimize your sqrt, the fastest one I've seen is the old carmack quake 3 sqrt:

http://betterexplained.com/articles/understanding-quakes-fast-inverse-square-root/

That said, since sqrt is non-linear, you're not going to be able to do simple linear interpolation along your line to get your result. The best idea is to use a table lookup since that will give you blazing fast access to the data. And, since you appear to be iterating by whole integers, a table lookup should be exceedingly accurate.

Ron Warholic
Carmack's fastsqrt doesn't beat the hardware any more, I've found from recent timings. It was really useful in the days of the crappy x87 FPU, but SSE introduced a scalar recip square root estimate opcode with 1/1 throughput ( RCPSS ).
Crashworks
Which isn't guaranteed to be used by the compiler. At any rate, it's worth a shot and if it's no faster it might be worth going to SSE. Then again, it might not.
Ron Warholic
The other problem with Carmack's technique is that modern pipelines are very deep and there's no direct wires between the float and int registers, so casting from int to float induces a big load-hit-store stall. You can get the compiler to use the SSE op by specifying /arch:sse2 to MSVC, or using an intrinsic.
Crashworks
may be faster to use the FPU x87 but the op didn't state the architecture he was working on, i've used this method on 8 bit PICs to get MUCH faster sqrt calculations and used this paper: http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf (also linked in the above article) to find the best initial guess for my application
Mark
For the record, I'm using dual core and quad core intel CPUs.
+4  A: 

Just to get a feel for it, for your range y = 50, x = 0 gives r = 50 and y = 50, x = +/- 30 gives r ~= 58.3. You want an approximation good for +/- 0.1%, or +/- 0.05 absolute. That's a lot lower accuracy than most library sqrts do.

Two approximate approaches - you calculate r based on interpolating from the previous value, or use a few terms of a suitable series.

Interpolating from previous r

r = ( x2 + y2 ) 1/2

dr/dx = 1/2 . 2x . ( x2 + y2 ) -1/2 = x/r

    double r = 50;

    for ( int x = 0; x <= 30; ++x ) {

        double r_true = Math.sqrt ( 50*50 + x*x );

        System.out.printf ( "x: %d r_true: %f r_approx: %f error: %f%%\n", x, r, r_true, 100 * Math.abs ( r_true - r ) / r );

        r = r + ( x + 0.5 ) / r; 
    }

Gives:

x: 0 r_true: 50.000000 r_approx: 50.000000 error: 0.000000%
x: 1 r_true: 50.010000 r_approx: 50.009999 error: 0.000002%
....
x: 29 r_true: 57.825065 r_approx: 57.801384 error: 0.040953%
x: 30 r_true: 58.335225 r_approx: 58.309519 error: 0.044065%

which seems to meet the requirement of 0.1% error, so I didn't bother coding the next one, as it would require quite a bit more calculation steps.

Truncated Series

The taylor series for sqrt ( 1 + x ) for x near zero is

sqrt ( 1 + x ) = 1 + 1/2 x - 1/8 x2 ... + ( - 1 / 2 )n+1 xn

Using r = y sqrt ( 1 + (x/y)2 ) then you're looking for a term t = ( - 1 / 2 )n+1 0.36n with magnitude less that a 0.001, log ( 0.002 ) > n log ( 0.18 ) or n > 3.6, so taking terms to x^4 should be Ok.

Pete Kirkham
I think his example data set was given as a small subset of the full range (otherwise doing 60 sqrts shouldn't be a problem). In the event that his X grows much higher than the 30 you give the error will quickly become unacceptable. Perhaps a slightly modified version would be to recalculate r_true every 30 or 40 X steps to keep the error in check.
Ron Warholic
Ooh, expressing dr/dx as a function f(x,r_n-1) is a clever idea. That should definitely work well enough, at least to cut the number of sqrt()s down to one in 24 steps.
Crashworks
I have a large number (50,000/second) of fairly short (100-200) scan lines, so restarting the approximation at the beginning of each scan line would work well.I'm thinking of doing a positive side and negative side, both starting at 0 so that the errors are symmetric.
I've found a small problem with the "Interpolating from previous r" solution. When Y is 0, the scanline will eventually hit the point where r is 0 and I will get a divide by 0 when I try get calculate r for x = 1. I wish I had done the positive-side/negative-side thing as I thought about before. That will eliminate this problem.
Adding a test for r < some epsilon and falling back to the sqrt version shouldn't be too much of a change.
Pete Kirkham
A: 

This is sort of related to a HAKMEM item:

ITEM 149 (Minsky): CIRCLE ALGORITHM Here is an elegant way to draw almost circles on a point-plotting display:

NEW X = OLD X - epsilon * OLD Y
NEW Y = OLD Y + epsilon * NEW(!) X

This makes a very round ellipse centered at the origin with its size determined by the initial point. epsilon determines the angular velocity of the circulating point, and slightly affects the eccentricity. If epsilon is a power of 2, then we don't even need multiplication, let alone square roots, sines, and cosines! The "circle" will be perfectly stable because the points soon become periodic.

The circle algorithm was invented by mistake when I tried to save one register in a display hack! Ben Gurley had an amazing display hack using only about six or seven instructions, and it was a great wonder. But it was basically line-oriented. It occurred to me that it would be exciting to have curves, and I was trying to get a curve display hack with minimal instructions.

Paul Reiners