As part of my bachelors degree we (a group of four) spend half a semester studying a program called Fhourstones,
which is an integer benchmark that solves positions in the game of connect-4, as played on a vertical 7x6 board.
By default, it uses a 64Mb
transposition table with the twobig
replacement strategy. Positions are
represented as 64-bit bitboards, and
the hash function is computed using a
single 64-bit modulo operation, giving
64-bit machines a slight edge. The
alpha-beta searcher sorts moves
dynamically based on the history
heuristic. A move causing a cutoff is
rewarded as many points as moves
previously tried, each of which gets a
-1 penalty, thus preserving total weight and avoiding renormalization
(uniform penalties were found to work
much better than depth dependent
ones).
Although the initial assignment was only to study the techniques used and present this to our classmates, we set out to see if we could extend the program to solve 8x8 boards as well (just over the current limit). This proved troublesome as the code was highly optimized, storing a single board position inside a (Java) long.
// bitmask corresponds to board as follows in 7x6 case:
// . . . . . . . TOP
// 5 12 19 26 33 40 47
// 4 11 18 25 32 39 46
// 3 10 17 24 31 38 45
// 2 9 16 23 30 37 44
// 1 8 15 22 29 36 43
// 0 7 14 21 28 35 42 BOTTOM
For red stones on the board the numbered bit would be set to 1, for black stones to 0. By then taking the 'skyline' of the board and setting each bit above it to 1 (like a layer of snow over rooftops) the entire board position for boards up to 8x7 can be stored using 64 bits. The entire program relied on this encoding to efficiently test for a win using 8 shift/ands and 4 comparisons.
Obviously we needed something bigger if we wanted to store larger boards, but because Java does not provide a primitive 128 bit data type, we had to look at other ways to go about storing the board positions. However, any solution we tried proved to be anywhere from 25 to 100 times slower than the original, the 'best' one being BigInteger.
As the projected runtime for the original code for the larger board was already well over a day, this increase in time required was unacceptable. Determined not to let the project fail, and somehow beat the odds at finding a faster solution, I set out to create my own number monstrosity, called
IntLong
As the name implies is was an int concatenated with a long, designed to provide us with just enough bits to be able to store the 8 x 8 + 8 = 72 bits needed. Next came the problem of making these two numbers work together to represent one larger number, and implementing all the bit operations used throughout the code base. We used the int to store the higher bits, and the long to store the lower bit, with a overflow from long to int on the 60th bit. This made it trivially easy to implement the and, or and xor operations. The shift left and shift right operations were easy as long as you were careful to move the right bits over using ands and shifts. We worked around not having to implement addition, subtractions, mulitplication and division by replacing them <<, >>, & and |, which worked surprisingly well as these were only few and far between.
The real problem came when I was trying to figure out how to implement the modulo operation in a fast and accurate way, as this was used for hashing the board positions, which would happen a few million times in our application.
After two weeks of trying every possible algorithm and finding them all too slow, I was damn near ready to give up. Then it hit me that we were only ever calculating the modulo by a known large primitive number, as part of our transposition table hashing mechanism. So with that in mind I implemented the following function:
public int modulo(int divider, int shifts) {
IntLong temp = new IntLong(this.high, 0).shiftRight(shifts);
if (0 < temp.high)
throw new IllegalArgumentException("IntLong.modulo(int divider): argument too small, high still " + Integer.toString(temp.high, 2));
temp = new IntLong(0, temp.low % divider).shiftLeft(shifts);
temp = temp.or(new IntLong(0, this.low % divider));
return (int) (temp.low % divider);
}
It takes the divider and the number of bits in the divider as arguments, and performs the following:
- take the higher bits in the int, and shift them shifts times to the right, ignoring the lower bits in the long
- divide that by the divider and shift the remainder back to the left shifts times
- take the lower bits from the long, divide them by the divider and keep the remainder
- overlay the bits from two and three (which by now should not overlap) into one long
- return the remainder from one final division by divider and you should have the correct result
It took me a very long time before I could finally convince myself that the end result was indeed correct, even after all my Unit tests passed flying colors. In the end it turned out to be 2.5 times faster than the BigInteger implementation (still 10 times slower than the original), but it was just enough to run our calculations in a realistic time frame.
It's the ugliest hack I've ever had to implement, but I'm also still proud about having somehow beaten the odds and saved the project. :)