ansaurus

Question

Fastest way to determine if an integer's square root is an integer

Answer 1

+2 A:

Just wondering, why are you doing + 0.5? If the number n is a perfect square rounding will not matter anyway. Other than that your code looks fine.

Aistina 2008-11-17 13:47:23

I was afraid that it might give me something like 67.99999999999999999999998 for an answer. Although, looking at the documentation for Math.sqrt(), I believe you're right and the "+ 0.5" isn't necessary.

Kip 2008-11-17 13:54:30

Update- I've confirmed that the 0.5 isn't necessary by checking that (long)Math.sqrt(i*i) == i, for every integer i in the range 0..Integer.MAX_VALUE.

Kip 2008-11-18 15:23:07

Answer 2

A:

If you want speed, given that your integers are of finite size, I suspect that the quickest way would involve (a) partitioning the parameters by size (e.g. into categories by largest bit set), then checking the value against an array of perfect squares within that range.

Celestial M Weasel 2008-11-17 13:48:29

There are 2^32 perfect squares in the range of a long. This table would be huge. Also, the advantage of computing the value over a memory access could be huge.

PeterAllenWebb 2008-11-17 14:54:16

Oh no there aren't, there are 2^16. 2^32 is 2^16 squared. There are 2^16.

Celestial M Weasel 2008-11-18 12:04:05

yes, but the range of a long is 64 bits, not 32 bits. sqrt(2^64)=2^32. (i'm ignoring the sign bit to make the math a little easier... there are actually (long)(2^31.5)=3037000499 perfect squares)

Kip 2008-12-02 19:14:57

Answer 3

+8 A:

I was thinking about the horrible times I've spent in Numerical Analysis course.

And then I remember, there was this function circling around the 'net from the Quake Source code:

float Q_rsqrt( float number )
{
  long i;
  float x2, y;
  const float threehalfs = 1.5F;

  x2 = number * 0.5F;
  y  = number;
  i  = * ( long * ) &y;  // evil floating point bit level hacking
  i  = 0x5f3759df - ( i >> 1 ); // what the fuck?
  y  = * ( float * ) &i;
  y  = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
  // y  = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed

  #ifndef Q3_VM
  #ifdef __linux__
    assert( !isnan(y) ); // bk010122 - FPE?
  #endif
  #endif
  return y;
}

Which basically calculates a square root, using Newton's approximation function (cant remember the exact name).

It should be usable and might even be faster, it's from one of the phenomenal id software's game!

It's written in C++ but it should not be too hard to reuse the same technique in Java once you get the idea:

I originally found it at: http://www.codemaestro.com/reviews/9

Newton's method explained at wikipedia: http://en.wikipedia.org/wiki/Newton%27s_method

You can follow the link for more explanation of how it works, but if you don't care much, then this is roughly what I remember from reading the blog and from taking the Numerical Analysis course:

the * (long*) &y is basically a fast convert-to-long function so integer operations can be applied on the raw bytes.
the 0x5f3759df - (i >> 1); line is a pre-calculated seed value for the approximation function.
the * (float*) &i converts the value back to floating point.
the y = y * ( threehalfs - ( x2 * y * y ) ) line bascially iterates the value over the function again.

The approximation function gives more precise values the more you iterate the function over the result. In Quake's case, one iteration is "good enough", but if it wasn't for you... then you could add as much iteration as you need.

This should be faster because it reduces the number of division operations done in naive square rooting down to a simple divide by 2 (actually a * 0.5F multiply operation) and replace it with a few fixed number of multiplication operations instead.

chakrit 2008-11-17 13:50:19

It should be noted that this returns 1/sqrt(number), not sqrt(number). I've done some testing, and this fails starting at n=410881: the John Carmack magic formula returns 642.00104, when the actual square root is 641.

Kip 2008-11-17 19:56:52

You could look at Chris Lomonts paper on fast inverse square roots: http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf It uses the same technique as here, but with a different magic number. The paper explains why the magic number was chosen.

Dan 2008-12-02 10:24:59

Also, http://www.beyond3d.com/content/articles/8/ and http://www.beyond3d.com/content/articles/15/ shed some light as to the origins of this method. It's often attributed to John Carmack, but it seems the original code was (possibly) written by Gary Tarolli, Greg Walsh and probably others.

Dan 2008-12-02 10:26:33

Answer 4

+8 A:

If you do a binary chop to try to find the "right" square root, you can fairly easily detect if the value you've got is close enough to tell:

(n+1)^2 = n^2 + 2n + 1
(n-1)^2 = n^2 - 2n + 1

So having calculated n^2, the options are:

n^2 = target: done, return true
n^2 + 2n + 1 > target > n^2 : you're close, but it's not perfect: return false
n^2 - 2n + 1 < target < n^2 : ditto
target < n^2 - 2n + 1 : binary chop on a lower n
target > n^2 + 2n + 1 : binary chop on a higher n

(Sorry, this uses n as your current guess, and target for the parameter. Apologise for the confusion!)

I don't know whether this will be faster or not, but it's worth a try.

EDIT: The binary chop doesn't have to take in the whole range of integers, either (2^x)^2 = 2^(2x), so once you've found the top set bit in your target (which can be done with a bit-twiddling trick; I forget exactly how) you can quickly get a range of potential answers. Mind you, a naive binary chop is still only going to take up to 31 or 32 iterations.

Jon Skeet 2008-11-17 13:50:48

My money is on this kind of approach. Avoid calling sqrt() since it is calculating a full square root, and you only need the first few digits.

PeterAllenWebb 2008-11-17 14:44:37

On the other hand, if the floating point is being done in a dedicated FP unit, it may be using all kinds of fun tricks. I wouldn't like to bet on it without a benchmark :)(I may try it tonight though in C#, just to see...)

Jon Skeet 2008-11-17 14:49:38

Hardware sqrts are actually pretty fast these days.

Adam Rosenfield 2008-11-29 03:58:51

Answer 5

+10 A:

I'm not sure if it would be faster, or even accurate, but you could use John Carmack's Magical Square Root, algorithm to solve the square root faster. You could probably easily test this for all possible 32 bit integers, and validate that you actually got correct results, as it's only an appoximation. However, now that I think about it, using doubles is approximating also, so I'm not sure how that would come into play.

Kibbee 2008-11-17 13:51:45

+1 for the groovy Carmack reference!

Mitch Wheat 2008-11-17 14:30:39

I believe Carmack's trick is fairly pointless these days. The built-in sqrt instruction is a lot faster than it used to be, so you may be better off just performing a regular square root and testing if the result is an int. As always, benchmark it.

jalf 2008-11-17 15:22:49

doubles can always represent integers in their range exactly

luke 2008-11-17 18:45:30

This breaks starting at n=410881, the John Carmack magic formula returns 642.00104, when the actual square root is 641.

Kip 2008-11-17 19:55:13

I recently used Carmack's trick in a Java game and it was very effective, giving a speedup of about 40%, so it is still useful, at least in Java.

finnw 2010-01-07 11:57:49

@finnw - +40% in actual framerate or just in the sqrt function? I'm actually quite surprised about this, since most of the games I've played (and the one I've made in XNA) have been GPU-bound instead of CPU-bound anyway (though I know this can differ between use's machine, blah blah blah)

Robert Fraser 2010-05-06 10:27:45

@Robert Fraser Yes +40% in the overall frame rate. The game had a particle physics system which took up nearly all available CPU cycles, dominated by the square root function and the round-to-nearest-integer function (which I had also optimised using a similar bit twiddling hack.)

finnw 2010-05-06 10:48:51

Answer 6

A:

Don't know about fastest, but the simplest is to take the square root in the normal fashion, multiply the result by itself, and see if it matches your original value.

Since we're talking integers here, the fasted would probably involve a collection where you can just make a lookup.

Joel Coehoorn 2008-11-17 13:57:20

wouldn't it be faster and cheaper to "take the square root in the normal fashion" and check if its an int?

nickf 2008-11-17 14:04:30

no - because the sqrt function returns floating point values

warren 2008-11-17 14:37:22

I suppose it depends on whether a mod op or a mult op is faster on your system.

Joel Coehoorn 2008-11-17 15:57:35

Answer 7

A:

Premature Optimization Is The Root Of All Evil

Today FPUs fast and support squarerooting. Also the multiplication is really fast. So you should really check if the this computation is a bottleneck and a newer version is faster. E.g. lookup tables were often used to accelerate trigonometric/sqrt computations. But we have now times, where not the computation is the bottleneck, but the io to the memory. That means that lut are only usefull if you speedup a long calculation (and not something the fpu could compute in one instruction in 100 cycles). If you have a really huge cache, and no other application data which could get into it, you maybe can avoid the mem load penalty, but often it is not really helpfull. Same goes for other approach: most of the iterative/fixpoint/series approaches take for sure longer than the normal fpu sqrt.

To suggest some points to optimize: Sadly you program java, there is very few control for the generated/executed asm code, but to give you some hints were you could look:

Check your CPU instruction manual which floating point operation the fpu supports: common restrictions are:

-only float support, no double

Here you should switch to float instead of double (I know no java std api function which does the sqrt on the float instead of double)

-no support of sqrt, but of 1/sqrt

In this case you should rewrite the code (I hope your java interpreter is smart enough to utilize this), e.g. as:

reci_sqrt = 1.0 / Math.sqrt(number); // gets calculated in one asm instruction
is_sqrt = Math.round(number * reci_sqrt  * reci_sqrt) == 1;

(In this example you would trade a division for a multiplication, if its really worth depends on your cpu))

flolo 2008-11-17 14:18:02

Uh. The Floating-Point Unit will give a Floating square root. It is not clear whether doing {float x = sqrt(n); return x==round(x);} is the best idea, given possible precision errors and considering the fact that you don't even want the square root; you just want to test whether n is a square.

ShreevatsaR 2008-11-17 14:32:18

you saw the comment about it being for Project Euler? while your point happens to be true, it's irrelevant in this case :)

warren 2008-11-17 14:34:40

this is not for production code, it's for hobby code that i want to run as fast as possible because i want it to run as fast as possible. this was mentioned in the question.

Kip 2008-11-17 14:38:13

"gets calculated in one asm instruction" really? Then why the following paper say that it is 4x slower than the fullfledged algorithm? http://www.math.purdue.edu/~clomont/Math/Papers/2003/InvSqrt.pdf

Jader Dias 2009-02-06 20:25:55

@Jader Dias: Yes really! I just read the (6 year old!) paper, they just suggest an algorithm for quick computing of inv root, but nothing where they even tried to utilize or bench asm invsqrt (via lib or direct). Look e.g. at http://www.intel.com/software/products/mkl/data/vml/functions/invsqrt.html

flolo 2009-02-09 13:51:48

Answer 8

+29 A:

You'll have to do some benchmarking. The best algorithm will depend on the distribution of your inputs.

Your algorithm may be nearly optimal, but you might want to do a quick check to rule out some possibilities before calling your square root routine. For example, look at the last digit of your number in hex by doing a bit-wise "and." Perfect squares can only end in 0, 1, 4, or 9 in base 16, So for 75% of your inputs (assuming they are uniformly distributed) you can avoid a call to the square root in exchange for some very fast bit twiddling.

Kip benchmarked the following code implementing the hex trick. When testing numbers 1 through 100,000,000, this code ran twice as fast as the original.

public final static boolean isPerfectSquare(long n)
{
    if (n < 0)
        return false;

    switch((int)(n & 0xF))
    {
    case 0: case 1: case 4: case 9:
        long tst = (long)Math.sqrt(n);
        return tst*tst == n;

    default:
        return false;
    }
}

When I tested the analogous code in C++, it actually ran slower than the original. However, when I eliminated the switch statement, the hex trick once again make the code twice as fast.

int isPerfectSquare(int n)
{
    int h = n & 0xF;  // h is the last hex "digit"
    if (h > 9)
        return 0;
    // Use lazy evaluation to jump out of the if statement as soon as possible
    if (h != 2 && h != 3 && h != 5 && h != 6 && h != 7 && h != 8)
    {
        int t = (int) floor( sqrt((double) n) + 0.5 );
        return t*t == n;
    }
    return 0;
}

Eliminating the switch statement had little effect on the C# code.

John D. Cook 2008-11-17 14:27:17

that's pretty clever... wouldn't have thought of that

warren 2008-11-17 14:38:07

Nice point about the trailing bits. I would try to combine that test with some of the other remarks here.

PeterAllenWebb 2008-11-17 14:56:44

I benchmarked it for the first 100 million integers.. this approximately halves the time required.

Kip 2008-11-17 16:09:54

You could extend this beyond the last digit as well. For example, there are only 44 different values the least significant byte can have.

kenj0418 2010-02-08 04:41:52

Answer 9

+6 A:

It should be much faster to use Newton's method to calculate the Integer Square Root, then square this number and check, as you do in your current solution. Newton's method is the basis for the Carmack solution mentioned in some other answers. You should be able to get a faster answer since you're only interested in the integer part of the root, allowing you to stop the approximation algorithm sooner.

Another optimization that you can try: If the Digital Root of a number doesn't end in 1, 4, 7, or 9 the number is not a perfect square. This can be used as a quick way to eliminate 60% of your inputs before applying the slower square root algorithm.

Bill the Lizard 2008-11-17 14:58:24

The digital root is strictly computationally equivalent to modulo, so should be considered along with other modulo methods here, such as mod 16 and mod 255.

Christian Oudard 2010-01-08 16:48:00

Answer 10

A:

If speed is a concern, why not partition off the most commonly used set of inputs and their values to a lookup table and then do whatever optimized magic algorithm you have come up with for the exceptional cases?

Elijah 2008-11-17 23:29:58

The problem is that there is no "commonly used set of inputs"--usually I'm iterating through a list, so I won't use the same inputs twice.

Kip 2009-01-06 14:40:06

Answer 11

+5 A:

I want this function to work with all positive 64-bit signed integers

Math.sqrt() works with doubles as input parameters, so you won't get accurate results for integers bigger than 2^53.

mrzl 2008-11-18 15:57:44

I've actually tested the answer on all perfect squares larger than 2^53, as well as all numbers from 5 below each perfect square to 5 above each perfect square, and I get the correct result. (the roundoff error is corrected when I round the sqrt answer to a long, then square that value and compare)

Kip 2009-01-06 14:33:38

Answer 12

A:

I would have to ask (maybe the answer is NO) couldn't you just enumerate the squares, rather than enumerate the numbers and ask if they are squares. (i.e. think outside the box)

Mike Dunlavey 2008-11-19 00:18:07

no, i think there would be 2^31-1 perfect squares to enumerate in the range 0..2^63-1 (the range of all nonnegative long values). this is far too many for a lookup table to be practical.

Kip 2008-11-19 13:44:37

What I meant was, like in my business (biostat), we're adjusting matrix M to maximize a data fit, but M has to have a well-defined square root. Instead we adjust another matrix C where M=C*C, and never have to worry about square root.

Mike Dunlavey 2008-11-19 18:24:00

Answer 13

+3 A:

It's been pointed out that the last d digits of a perfect square can only take on certain values. The last d digits (in base b) of a number n is the same as the remainder when n is divided by b^d, ie. in C notation n % pow(b, d).

This can be generalized to any modulus m, ie. n % m can be used to rule out some percentage of numbers from being perfect squares. The modulus you are currently using is 64, which allows 12, ie. 19% of remainders, as possible squares. With a little coding I found the modulus 110880, which allows only 2016, ie. 1.8% of remainders as possible squares. So depending on the cost of a modulus operation (ie. division) and a table lookup versus a square root on your machine, using this modulus might be faster.

By the way if Java has a way to store a packed array of bits for the lookup table, don't use it. 110880 32-bit words is not much RAM these days and fetching a machine word is going to be faster than fetching a single bit.

Hugh Allen 2008-11-29 03:52:36

Nice. Did you work this out algebraically or by trial and error? I can see why it's so effective - lots of collisions between perfect squares, e.g. 333^2 % 110880 == 3^2, 334^2 % 110880 == 26^2, 338^2 % 110880 == 58^2...

finnw 2010-01-07 12:28:07

IIRC it was brute force, but note that 110880 = 2^5 * 3^2 * 5 * 7 * 11, which gives 6*3*2*2*2 - 1 = 143 proper divisors.

Hugh Allen 2010-01-19 01:18:52

Answer 14

+1 A:

Just for the record, another approach is to use the prime decomposition. If every factor of the decomposition is even, then the number is a perfect square. So what you want is to see if a number can be decomposed as a product of squares of prime numbers. Of course, you don't need to obtain such a decomposition, just to see if it exists.

First build a table of squares of prime numbers which are lower than 2^32. This is far smaller than a table of all integers up to this limit.

A solution would then be like this:

boolean isPerfectSquare(long number)
{
    if (number < 0) return false;
    if (number < 2) return true;

    for (int i = 0; ; i++)
    {
        long square = squareTable[i];
        if (square > number) return false;
        while (number % square == 0)
        {
            number /= square;
        }
        if (number == 1) return true;
    }
}

I guess it's a bit cryptic. What it does is checking in every step that the square of a prime number divide the input number. If it does then it divides the number by the square as long as it is possible, to remove this square from the prime decomposition. If by this process, we came to 1, then the input number was a decomposition of square of prime numbers. If the square becomes larger than the number itself, then there is no way this square, or any larger squares, can divide it, so the number can not be a decomposition of squares of prime numbers.

Given nowadays' sqrt done in hardware and the need to compute prime numbers here, I guess this solution is way slower. But it should give better results than solution with sqrt which won't work over 2^54, as says mrzl in his answer.

ckarmann 2008-12-02 10:00:22

Answer 15

+2 A:

For performance, you very often have to do some compromsies. Others have expressed various methods, however, you noted Carmack's hack was faster up to certain values of N. Then, you should check the "n" and if it is less than that number N, use Carmack's hack, else use some other method described in the answers here.

BobbyShaftoe 2008-12-05 05:36:38

good point! i can't believe that didn't cross my mind

Kip 2008-12-05 19:01:37

I've incorporated your suggestion into the solution too. Also, nice handle. :)

Kip 2008-12-05 19:56:34

No problem, and thanks :)

BobbyShaftoe 2008-12-05 20:39:59

Answer 16

+2 A:

You should get rid of the 2-power part of N right from the start.

2nd Edit The magical expression for m below should be

m = N - (N & (N-1));

and not as written

End of 2nd edit

m = N & (N-1); // the lawest bit of N
N /= m;
byte = N & 0x0F;
if ((m % 2) || (byte !=1 && byte !=9))
  return false;

1st Edit:

Minor improvement:

m = N & (N-1); // the lawest bit of N
N /= m;
if ((m % 2) || (N & 0x07 != 1))
  return false;

End of 1st edit

Now continue as usual. This way, by the time you get to the floating point part, you already got rid of all the numbers whose 2-power part is odd (about half), and then you only consider 1/8 of whats left. I.e. you run the floating point part on 6% of the numbers.

David Lehavi 2009-01-01 22:12:38

Answer 17

+1 A:

This a rework from decimal to binary of the old Marchant calculator algorithm (sorry, I don't have a reference), in Ruby, adapted specifically for this question:

def isexactsqrt(v)
    value = v.abs
    residue = value
    root = 0
    onebit = 1
    onebit <<= 8 while (onebit < residue)
    onebit >>= 2 while (onebit > residue)
    while (onebit > 0)
     x = root + onebit
     if (residue >= x) then
      residue -= x
      root = x + onebit
     end
     root >>= 1
     onebit >>= 2
    end
    return (residue == 0)
end

Here's a workup of something similar (please don't vote me down for coding style/smells or clunky O/O - it's the algorithm that counts, and C++ is not my home language). In this case, we're looking for residue == 0:

#include <iostream>  

using namespace std;  
typedef unsigned long long int llint;

class ISqrt {     // Integer Square Root
    llint value;  // Integer whose square root is required
    llint root;   // Result: floor(sqrt(value))
    llint residue;  // Result: value-root*root
    llint onebit, x; // Working bit, working value

public:

    ISqrt(llint v = 2) { // Constructor
     Root(v);   // Take the root 
    };

    llint Root(llint r) { // Resets and calculates new square root
     value = r;   // Store input
     residue = value; // Initialise for subtracting down
     root = 0;   // Clear root accumulator

     onebit = 1;     // Calculate start value of counter
     onebit <<= (8*sizeof(llint)-2);   // Set up counter bit as greatest odd power of 2 
     while (onebit > residue) {onebit >>= 2; }; // Shift down until just < value

     while (onebit > 0) {
      x = root ^ onebit;   // Will check root+1bit (root bit corresponding to onebit is always zero)
      if (residue >= x) {   // Room to subtract?
       residue -= x;   // Yes - deduct from residue
       root = x + onebit;  // and step root
      };
      root >>= 1;
      onebit >>= 2;
     };
     return root;     
    };
    llint Residue() {   // Returns residue from last calculation
     return residue;     
    };
};

int main() {
    llint big, i, q, r, v, delta;
    big = 0; big = (big-1);   // Kludge for "big number"
    ISqrt b;       // Make q sqrt generator
    for ( i = big; i > 0 ; i /= 7 ) { // for several numbers
     q = b.Root(i);     // Get the square root
     r = b.Residue();    // Get the residue
     v = q*q+r;      // Recalc original value
     delta = v-i;     // And diff, hopefully 0
     cout << i << ": " << q << " ++ " << r << " V: " << v << " Delta: " << delta << "\n";
    };
    return 0;
};

Brent.Longborough 2009-01-01 22:27:43

The number of iterations looks O(ln n), where n is the bit-length of v, so I doubt this will save much for larger v. Floating point sqrt is slow, maybe 100-200 cycles, but integer math isn't free either. A dozen iterations with 15 cycles each, and it'd be a wash. Still, +1 for being interesting.

Tadmas 2009-01-02 00:24:34

Er, make that O(n). Oops.

Tadmas 2009-01-02 00:28:16

Actually, I believe the additions and subtractions can be done by XOR.

Brent.Longborough 2009-01-02 08:12:15

That was a daft comment - only the addition can be done by an XOR; the subtraction is arithmetic.

Brent.Longborough 2009-01-02 09:13:29

Now two daft comments. There's only one opportunity for XOR.

Brent.Longborough 2009-01-02 11:27:06

Is there really any substantive difference between the run time of XOR and addition anyway?

Tadmas 2009-01-02 22:59:34

@Tadmas: probably not enough to break the "optimise later" rule. (:-)

Brent.Longborough 2009-01-04 00:17:51

Answer 18

+116 A:

A. Rex 2009-01-08 16:32:42

Wow! I'll try to convert this to Java and do a comparison, as well as an accuracy check on the results. I'll let you know what I find.

Kip 2009-01-12 21:47:11

Testing on all values is impossible, but a test on suspect values (+/- 1 from very large perfect squares) has proven to be accurate. On a run of the first billion integers, this only took 34% of the time required by the original algorithm.

Kip 2009-01-16 20:13:52

Awesome! I'm glad it worked out for you. One thing that's different in C(++) and Java is that Java checks bounds on array lookups, so I thought you might take a hit there.

A. Rex 2009-01-16 20:50:31

Seems that this only returns true/false, but it will be difficult to return the square root itself?

Dimitre Novatchev 2009-03-27 23:44:55

Wow, this is beautiful. I'd seen Hensel lifting before (computing roots of polynomials modulo a prime) but I hadn't even realised the lemma could be carefully lowered all the way for computing square roots of numbers; this is... uplifting :)

ShreevatsaR 2009-05-25 02:36:58

@Dimitre Novatchev: No, it would not be too difficult. If the number is a perfect square, then its square root is `r` times some power of 2, which can be determined when dividing out by powers of 4.

A. Rex 2009-12-03 23:50:42

Answer 19

A:

It ought to be possible to pack the 'cannot be a perfect square if the last X digits are N' much more efficiently than that! I'll use java 32 bit ints, and produce enough data to check the last 16 bits of the number - that's 2048 hexadecimal int values.

...

Ok. Either I have run into some number theory that is a little beyond me, or there is a bug in my code. In any case, here is the code:

public static void main(String[] args) {
    final int BITS = 16;

    BitSet foo = new BitSet();

    for(int i = 0; i< (1<<BITS); i++) {
        int sq = (i*i);
        sq = sq & ((1<<BITS)-1);
        foo.set(sq);
    }

    System.out.println("int[] mayBeASquare = {");

    for(int i = 0; i< 1<<(BITS-5); i++) {
        int kk = 0;
        for(int j = 0; j<32; j++) {
            if(foo.get((i << 5) | j)) {
                kk |= 1<<j;
            }
        }
        System.out.print("0x" + Integer.toHexString(kk) + ", ");
        if(i%8 == 7) System.out.println();
    }
    System.out.println("};");
}

and here are the results:

(ed: elided for poor performance in prettify.js; view revision history to see.)

paulmurray 2009-02-22 06:22:01

Answer 20

A:

Regarding the Carmac method, it seems like it would be quite easy just to iterate once more, which should double the number of digits of accuracy. It is, after all, an extremely truncated iterative method -- Newton's, with a very good first guess.

Regarding your current best, I see two micro-optimizations:

move the check vs. 0 after the check using mod255
rearrange the dividing out powers of four to skip all the checks for the usual (75%) case.

I.e:

// Divide out powers of 4 using binary search

if((n & 0x3L) == 0) {
  n >>=2;

  if((n & 0xffffffffL) == 0)
    n >>= 32;
  if((n & 0xffffL) == 0)
      n >>= 16;
  if((n & 0xffL) == 0)
      n >>= 8;
  if((n & 0xfL) == 0)
      n >>= 4;
  if((n & 0x3L) == 0)
      n >>= 2;
}

Even better might be a simple

while ((n & 0x03L) == 0) n >>= 2;

Obviously, it would be interesting to know how many numbers get culled at each checkpoint -- I rather doubt the checks are truly independent, which makes things tricky.

Ben 2009-03-11 13:18:14

Answer 21

A:

The sqrt call is not perfectly accurate, as has been mentioned, but it's interesting and instructive that it doesn't blow away the other answers in terms of speed. After all, the sequence of assembly language instructions for a sqrt is tiny. Intel has a hardware instruction, which isn't used by Java I believe because it doesn't conform to IEEE.

So why is it slow? Because Java is actually calling a C routine through JNI, and it's actually slower to do so than to call a Java subroutine, which itself is slower than doing it inline. This is very annoying, and Java should have come up with a better solution, ie building in floating point library calls if necessary. Oh well.

In C++, I suspect all the complex alternatives would lose on speed, but I haven't checked them all. What I did, and what Java people will find usefull, is a simple hack, an extension of the special case testing suggested by A. Rex. Use a single long value as a bit array, which isn't bounds checked. That way, you have 64 bit boolean lookup.

typedef unsigned long long UVLONG
UVLONG pp1,pp2;

void init2() {
  for (int i = 0; i < 64; i++) {
    for (int j = 0; j < 64; j++)
      if (isPerfectSquare(i * 64 + j)) {
    pp1 |= (1 << j);
    pp2 |= (1 << i);
    break;
      }
   }
   cout << "pp1=" << pp1 << "," << pp2 << "\n";  
}


inline bool isPerfectSquare5(UVLONG x) {
  return pp1 & (1 << (x & 0x3F)) ? isPerfectSquare(x) : false;
}

The routine isPerfectSquare5 runs in about 1/3 the time on my core2 duo machine. I suspect that further tweaks along the same lines could reduce the time further on average, but every time you check, you are trading off more testing for more eliminating, so you can't go too much farther on that road.

Certainly, rather than having a separate test for negative, you could check the high 6 bits the same way.

Note that all I'm doing is eliminating possible squares, but when I have a potential case I have to call the original, inlined isPerfectSquare.

The init2 routine is called once to initialize the static values of pp1 and pp2. Note that in my implementation in C++, I'm using unsigned long long, so since you're signed, you'd have to use the >>> operator.

There is no intrinsic need to bounds check the array, but Java's optimizer has to figure this stuff out pretty quickly, so I don't blame them for that.

2009-03-11 13:25:05

Answer 22

+1 A:

Project Euler is mentioned in the tags and many of the problems in it require checking numbers >> 2^64. Most of the optimizations mentioned above don't work easily when you are working with an 80 byte buffer.

I used java BigInteger and a slightly modified version of Newton's method, one that works better with integers. The problem was that exact squares n^2 converged to (n-1) instead of n because n^2-1 = (n-1)(n+1) and the final error was just one step below the final divisor and the algorithm terminated. It was easy to fix by adding one to the original argument before computing the error. (Add two for cube roots, etc.)

One nice attribute of this algorithm is that you can immediately tell if the number is a perfect square - the final error (not correction) in Newton's method will be zero. A simple modification also lets you quickly calculate floor(sqrt(x)) instead of the closest integer. This is handy with several Euler problems.

bgiles 2009-03-25 15:23:05

Answer 23

A:

I like the idea to use an almost correct method on some of the input. Here is a version with a higher "offset". The code seems to work and passes my simple test case.

Just replace your:

if(n < 410881L){...}

code with this one:

if (n < 11043908100L) {
    //John Carmack hack, converted to Java.
    // See: http://www.codemaestro.com/reviews/9
    int i;
    float x2, y;

    x2 = n * 0.5F;
    y = n;
    i = Float.floatToRawIntBits(y);
    //using the magic number from 
    //http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf
    //since it more accurate
    i = 0x5f375a86 - (i >> 1);
    y = Float.intBitsToFloat(i);
    y = y * (1.5F - (x2 * y * y));
    y = y * (1.5F - (x2 * y * y)); //Newton iteration, more accurate

    sqrt = Math.round(1.0F / y);
} else {
    //Carmack hack gives incorrect answer for n >= 11043908100.
    sqrt = (long) Math.sqrt(n);
}

2009-05-25 01:22:28

Answer 24

A:

"I'm looking for the fastest way to determine if a long value is a perfect square (i.e. its square root is another integer)."

The answers are impressive, but I failed to see a simple check :

check whether the first number on the right of the long it a member of the set (0,1,4,5,6,9) . If it is not, then it cannot possibly be a 'perfect square' .

eg.

4567 - cannot be a perfect square.

2009-09-24 09:46:57

wow... didn't realise it was such an old topic.

2009-09-24 09:55:15

Kip 2009-09-24 13:42:31

Kip 2009-09-24 13:44:52

ah, I stand corrected :)

2009-09-25 06:25:47

Perhaps the following will be faster?Change:if( n < 0 || ((n if( n == 0 ) return true;into: if( n == 0 ) return true; if( n < 0 || !( ( n

2009-09-25 09:47:42

My above comment is incorrect (in many ways), plz ignore.

2009-09-25 10:46:57

Answer 25

A:

This is the fastest Java implementation I could come up with, using a combination of techniques suggested by others in this thread.

Mod-256 test
Inexact mod-3465 test (avoids integer division at the cost of some false positives)
Floating-point square root, round and compare with input value

I also experimented with these modifications but they did not help performance:

Additional mod-255 test
Dividing the input value by powers of 4
Fast Inverse Square Root (to work for high values of N it needs 3 iterations, enough to make it slower than the hardware square root function.)

public class SquareTester {

    public static boolean isPerfectSquare(long n) {
        if (n < 0) {
            return false;
        } else {
            switch ((byte) n) {
            case -128: case -127: case -124: case -119: case -112:
            case -111: case -103: case  -95: case  -92: case  -87:
            case  -79: case  -71: case  -64: case  -63: case  -60:
            case  -55: case  -47: case  -39: case  -31: case  -28:
            case  -23: case  -15: case   -7: case    0: case    1:
            case    4: case    9: case   16: case   17: case   25:
            case   33: case   36: case   41: case   49: case   57:
            case   64: case   65: case   68: case   73: case   81:
            case   89: case   97: case  100: case  105: case  113:
            case  121:
                long i = (n * INV3465) >>> 52;
                if (! good3465[(int) i]) {
                    return false;
                } else {
                    long r = round(Math.sqrt(n));
                    return r*r == n; 
                }
            default:
                return false;
            }
        }
    }

    private static int round(double x) {
        return (int) Double.doubleToRawLongBits(x + (double) (1L << 52));
    }

    /** 3465<sup>-1</sup> modulo 2<sup>64</sup> */
    private static final long INV3465 = 0x8ffed161732e78b9L;

    private static final boolean[] good3465 =
        new boolean[0x1000];

    static {
        for (int r = 0; r < 3465; ++ r) {
            int i = (int) ((r * r * INV3465) >>> 52);
            good3465[i] = good3465[i+1] = true;
        }
    }

}

finnw 2010-05-06 13:29:10

ansaurus

tags:

views:

answers:

Fastest way to determine if an integer's square root is an integer

related questions