ansaurus

Question

Looking for an efficient integer square root algorithm for ARM Thumb2

Answer 1

+4 A:

One common approach is bisection.

hi = number
lo = 0
mid = ( hi + lo ) / 2
mid2 = mid*mid
while( lo < hi-1 and mid2 != number ) {
    if( mid2 < number ) {
        lo = mid
    else
        hi = mid
    mid = ( hi + lo ) / 2
    mid2 = mid*mid

Something like that should work reasonably well. It makes log2(number) tests, doing log2(number) multiplies and divides. Since the divide is a divide by 2, you can replace it with a >>.

The terminating condition may not be spot on, so be sure to test a variety of integers to be sure that the division by 2 doesn't incorrectly oscillate between two even values; they would differ by more than 1.

S.Lott 2009-07-08 19:35:02

Answer 2

+3 A:

If exact accuracy isn't required, I have a fast approximation for you, that uses 260bytes of ram (you could halve that, but don't).

int ftbl[33]={0,1,1,2,2,4,5,8,11,16,22,32,45,64,90,128,181,256,362,512,724,1024,1448,2048,2896,4096,5792,8192,11585,16384,23170,32768,46340};
int ftbl2[32]={ 32768,33276,33776,34269,34755,35235,35708,36174,36635,37090,37540,37984,38423,38858,39287,39712,40132,40548,40960,41367,41771,42170,42566,42959,43347,43733,44115,44493,44869,45241,45611,45977};

int fisqrt(int val)
{
    int cnt=0;
    int t=val;
    while (t) {cnt++;t>>=1;}
    if (6>=cnt)    t=(val<<(6-cnt));
    else           t=(val>>(cnt-6));

    return (ftbl[cnt]*ftbl2[t&31])>>15;
}

Here's the code to generate the tables:

ftbl[0]=0;
for (int i=0;i<32;i++) ftbl[i+1]=sqrt(pow(2.0,i));
printf("int ftbl[33]={0");
for (int i=0;i<32;i++) printf(",%d",ftbl[i+1]);
printf("};\n");

for (int i=0;i<32;i++) ftbl2[i]=sqrt(1.0+i/32.0)*32768;
printf("int ftbl2[32]={");
for (int i=0;i<32;i++) printf("%c%d",(i)?',':' ',ftbl2[i]);
printf("};\n");

Over the range 1->2^20, the maximum error is 11, and over the range 1->2^30, it's about 256. You could use larger tables and minimise this. It's worth mentioning that the error will always be negative - i.e. when it's wrong, the value will be LESS than the correct value.

You might do well to follow this with a refining stage.

The idea is simple enough: (ab)^0.5 = a^0.b * b^0.5.

So, we take the input X = A*B where A=2^N and 1<=B<2 Then we have a lookuptable for sqrt(2^N), and a lookuptable for sqrt(1<=B<2). We store the lookuptable for sqrt(2^N) as integer, which might be a mistake (testing shows no ill effects), and we store the lookuptable for sqrt(1<=B<2) at 15bits of fixed-point.

We know that 1<=sqrt(2^N)<65536, so that's 16bit, and we know that we can really only multiply 16bitx15bit on an ARM, without fear of reprisal, so that's what we do.

In terms of implementation, the while(t) {cnt++;t>>=1;} is effectively a count-leading-bits instruction (CLB), so if your version of the chipset has that, you're winning! Also, the shift instruction would be easy to implement with a bidirectional shifter, if you have one? There's a Lg[N] algorithm for counting the highest set bit here.

In terms of magic numbers, for changing table sizes, THE magic number for ftbl2 is 32, though note that 6 (Lg[32]+1) is used for the shifting.

Dave Gamble 2009-07-08 21:19:36

FWIW, though I don't really recommend this, you can quarter your overall error, with some biasing, viz: int v1=fisqrt(val); v1+=fisqrt(val-v1*v1)/16; 16 is the power of two that works best, over the range 1->2^24.

Dave Gamble 2009-07-08 22:17:57

Answer 3

+2 A:

I have settled to something similar to the binay digit-by-digit algorithm described in this Wikipedia article.

Ber 2009-07-08 23:32:54

Answer 4

+3 A:

Integer Square Roots by Jack W. Crenshaw could be useful as another reference.

The C Snippets Archive also has an integer square root implementation. That one goes beyond just the integer result, and calculates extra fractional (fixed-point) bits of the answer.

I settled on the following code. It's essentially from the Wikipedia article on square-root computing methods. But it has been changed to use stdint.h types uint32_t etc. Strictly speaking the return type could be changed to uint16_t.

/**
 * \brief    Fast Square root algorithm
 *
 * Fractional parts of the answer are discarded. That is:
 *      - SquareRoot(3) --> 1
 *      - SquareRoot(4) --> 2
 *      - SquareRoot(5) --> 2
 *      - SquareRoot(8) --> 2
 *      - SquareRoot(9) --> 3
 *
 * \param[in] a_nInput - unsigned integer for which to find the square root
 *
 * \return Integer square root of the input value.
 */
uint32_t SquareRoot(uint32_t a_nInput)
{
    uint32_t op  = a_nInput;
    uint32_t res = 0;
    uint32_t one = 1uL << 30; // The second-to-top bit is set: use 1u << 14 for uint16_t type; use 1uL<<30 for uint32_t type


    // "one" starts at the highest power of four <= than the argument.
    while (one > op)
    {
        one >>= 2;
    }

    while (one != 0)
    {
        if (op >= res + one)
        {
            op = op - (res + one);
            res = res +  2 * one;
        }
        res >>= 1;
        one >>= 2;
    }
    return res;
}

The nice thing, I discovered, is that a fairly simple modification can return the "rounded" answer. I found this useful in a certain application for greater accuracy. Note that in this case, the return type must be uint32_t because the rounded square root of 2³² - 1 is 2¹⁶.

/**
 * \brief    Fast Square root algorithm, with rounding
 *
 * This does arithmetic rounding of the result. That is, if the real answer
 * would have a fractional part of 0.5 or greater, the result is rounded up to
 * the next integer.
 *      - SquareRootRounded(2) --> 1
 *      - SquareRootRounded(3) --> 2
 *      - SquareRootRounded(4) --> 2
 *      - SquareRootRounded(6) --> 2
 *      - SquareRootRounded(7) --> 3
 *      - SquareRootRounded(8) --> 3
 *      - SquareRootRounded(9) --> 3
 *
 * \param[in] a_nInput - unsigned integer for which to find the square root
 *
 * \return Integer square root of the input value.
 */
uint32_t SquareRootRounded(uint32_t a_nInput)
{
    uint32_t op  = a_nInput;
    uint32_t res = 0;
    uint32_t one = 1uL << 30; // The second-to-top bit is set: use 1u << 14 for uint16_t type; use 1uL<<30 for uint32_t type


    // "one" starts at the highest power of four <= than the argument.
    while (one > op)
    {
        one >>= 2;
    }

    while (one != 0)
    {
        if (op >= res + one)
        {
            op = op - (res + one);
            res = res +  2 * one;
        }
        res >>= 1;
        one >>= 2;
    }

    /* Do arithmetic rounding to nearest integer */
    if (op > res)
    {
        res++;
    }

    return res;
}

Craig McQueen 2009-07-09 00:12:53

Answer 5

+3 A:

Here are a whole bunch of them. I remember looking at them when I was thinking of doing Nintendo DS programming.

Nosredna 2009-07-09 00:16:04

This really is impressive code; total of 51 cycles to find the square root of a 32bit int. Unfortunately it is less efficient on a Thumb2 processor, since it uses some conditional insns which require more code. Still a good solution.

Ber 2009-07-09 08:57:01

Answer 6

+1 A:

It's not fast but it's small and simple:

int isqrt(int n)
{
  int b = 0;

  while(n >= 0)
  {
    n = n - b;
    b = b + 1;
    n = n - b;
  }

  return b - 1;
}

2009-08-27 20:54:19

ansaurus

tags:

views:

answers:

Looking for an efficient integer square root algorithm for ARM Thumb2

related questions