Optimization notes
Both the hashset and radix sort algorithms can be optimized by noting three facts:
- Odd and even values can be handled separately
- Calculating an integer square root is a very fast operation (typically consists of 3-5 divides and a few adds)
- Cache locality is important for both of these algorithms
The optimized algorithms below will typically perform 5x faster and use less than half the RAM of the unoptimized case. In some cases where the data size is similar to the L2/L3 cache size they may perform 100x faster or more.
Optimized algorithm based on radix sort
Data structure is five lists of integers: IN, Aodd, Bodd, Aeven, Beven
The A and B lists use half the integer size of IN. (eg if IN = 64bits, A & B = 32bits)
- Scan list IN to find the largest odd and even numbers MAXodd and MAXeven
- Let LIMITodd = floor(sqrt(MAXodd))
- Let LIMITeven = floor(sqrt(MAXeven))
- For each number in list IN: a. Compute the square root if positive. If exact, add the square root to list Aodd/Aeven. b. If the number is >=0 and <= LIMITodd/LIMITeven, add it to list Bodd/Beven
- Radix sort list Aodd and Bodd using just log2(LIMITodd) bits
- Linear scan Aodd and Bodd for a match
- Radix sort list Aeven and Beven using just log2(LIMITeven) bits
- Linear scan Aeven and Beven for a match
If either linear scan finds a match, return that match immediately.
The reason this is much faster than the straightforward radix sort algorithm is that:
- The arrays being sorted typicaly have less than 1/4 the numbers of values and need only half the number of bits per integer, so a total of less than 1/8 the RAM in use in a given sort which is good for the cache.
- The radix sort is done on much fewer bits leading to fewer passes, so even if it does exceed your L1 or L2 cache you read RAM fewer times, and you read much less RAM
- The linear scan is typically much faster because the A list contains only exact square roots and the B list only contains small values
Optimized algorithm based on hashset
Data structure is list of integers IN, plus two hashsets A and B
The A and B sets use half the integer size of IN
- Scan list IN to find the largest odd and even numbers MAXodd and MAXeven
- Let LIMITodd = floor(sqrt(MAXodd))
- Let LIMITeven = floor(sqrt(MAXeven))
- For each odd number in list IN: a. Compute the square root if positive. If exact, check if square root exists in B & return if true otherwise add it to A. b. If the number is >=0 and <= LIMITodd/LIMITeven, check if it exists in A & return if true otherwise add it to B.
- Clear A and B and repeat step 4 for even numbers
The reason this is faster than the straightforward hashset algorithm is that:
- The hashset is typically 1/8 the amount of RAM leading to much better cache performance
- Only exact squares and small numbers have hashset entries, so much less time is spent hashing and adding/removing values
There is an additional small optimization available here: A and B can be a single hashset, which stores bit flag with each entry to say whether the integer is in A or B (it can't be in both because then the algorithm would have terminated).