views:

463

answers:

5

In the XKCD comic 195 a design for a map of the Internet address space is suggested using a Hilbert curve so that items from a similar IP adresses will be clustered together.

Given an IP address, how would I calculate its 2D coordinates (in the range zero to one) on such a map?

+4  A: 

Essentially you would decompose the number, using pairs of bits, MSB to LSB. The pair of bits tells you if the location is in the Upper Left (0) Lower Left (1) Lower Right (2) or Upper Right (3) quadrant, at a scale that gets finer as you shift through the number.

Additionally, you need to track an "orientation". This is the winding that is used at the scale you are at; the initial winding is as above (UL, LL, LR, UR), and depending on which quadrant you end up in, the winding at the next scale down is (rotated -90, 0, 0, +90) from your current winding.

So you could accumulate offsets :

suppose I start at 0,0, and the first pair gives me a 2, I shift offsets to 0.5, 0.5. The winding in the lower right is the same as my initial one. The next pair reduces the scale, so my adjustments are going to be 0.25 in length.

This pair is a 3, so I translate only my x coordinate and I am at .75, .5. The winding is now rotated over and my next scale down will be (LR, LL, UL, UR). The scale is now .125, and so on and so on until I run out of bits in my address.

Mikeb
I don't think this is quite right. It works as long as you only get ones and twos. If you get a 0 or 3 then you need to rotate as well as scaling down; otherwise all future bit-pairs will send you astray.
Jason Orendorff
yeah, I was just working on fixing that. More complicated than it looked at first!
Mikeb
I guess you fixed this bug?
Martin
Yeah, but I think I like jk's explanation better than mine.
Mikeb
yours is simpler, and I understand it more intuitively - gets my marks :)
Martin
A: 

I expect that based on the wikipedia code for a Hilbert curve you could keep track of your current position (as an (x, y) coordinate) and return that position after n cells had been visited. Then the position scaled onto [0..1] would depend on how high and wide the Hilbert curve was going to be at completion.

from turtle import left, right, forward

size = 10

def hilbert(level, angle):
    if level:
        right(angle)
        hilbert(level - 1, -angle)
        forward(size)
        left(angle)
        hilbert(level - 1, angle)
        forward(size)
        hilbert(level - 1, angle)
        left(angle)
        forward(size)
        hilbert(level - 1, -angle)
        right(angle)

Admittedly, this would be a brute force solution rather than a closed form solution.

hughdbrown
+9  A: 

This is pretty easy, since the Hilbert curve is a fractal, that is, it is recursive. It works by bisecting each square horizontally and vertically, dividing it into four pieces. So you take two bits of the IP address at a time, starting from the left, and use those to determine the quadrant, then continue, using the next two bits, with that quadrant instead of the whole square, and so on until you have exhausted all the bits in the address.

The basic shape of the curve in each square is horseshoe-like:

 0 3
 1 2

where the numbers correspond to the top two bits and therefore determine the traversal order. In the xkcd map, this square is the traversal order at the highest level. Possibly rotated and/or reflected, this shape is present at each 2x2 square.

Determination of how the "horseshoe" is oriented in each of the subsquares is determined by one rule: the 0 corner of the 0 square is in the corner of the larger square. Thus, the subsquare corresponding to 0 above must be traversed in the order

 0 1
 3 2

and, looking at the whole previous square and showing four bits, we get the following shape for the next division of the square:

 00 01 32 33
 03 02 31 30
 10 13 20 23
 11 12 21 22

This is how the square always gets divided at the next level. Now, to continue, just focus on the latter two bits, orient this more detailed shape according to how the horseshoe shape of those bits is oriented, and continue with a similar division.

To determine the actual coordinates, each two bits determine one bit of binary precision in the real number coordinates. So, on the first level, the first bit after the binary point (assuming coordinates in the [0,1] range) in the x coordinate is 0 if the first two bits of the address have the value 0 or 1, and 1 otherwise. Similarly, the first bit in the y coordinate is 0 if the first two bits have the value 1 or 2. To determine whether to add a 0 or 1 bit to the coordinates, you need to check the orientation of the horseshoe at that level.

EDIT: I started working out the algorithm and it turns out that it's not that hard after all, so here's some pseudo-C. It's pseudo because I use a b suffix for binary constants and treat integers as arrays of bits, but changing it to proper C shouldn't be too hard.

In the code, pos is a 3-bit integer for the orientation. The first two bits are the x and y coordinates of 0 in the square and the third bit indicates whether 1 has the same x coordinate as 0. The initial value of pos is 011b, meaning that the coordinates of 0 are (0, 1) and 1 has the same x coordinate as 0. ad is the address, treated as an n-element array of 2-bit integers, and starting from the most significant bits.

double x = 0.0, y = 0.0;
double xinc, yinc;
pos = 011b;
for (int i = 0; i < n; i++) {
    switch (ad[i]) {
        case 0: xinc = pos[0]; yinc = pos[1]; pos[2] = ~pos[2]; break;
        case 1: xinc = pos[0] ^ ~pos[2]; yinc = pos[1] ^ pos[2]; break;
        case 2: xinc = ~pos[0]; yinc = ~pos[1]; break;
        case 3: xinc = pos[0] ^ pos[2]; yinc = pos[1] ^ ~pos[2];
            pos = ~pos; break;
    }
    x += xinc / (1 << (i+1)); y += yinc / (1 << (i+1));
}

I tested it with a couple of 8-bit prefixes and it placed them correctly according to the xkcd map, so I'm somewhat confident the code is correct.

jk
@jk: "So you take two bits of the IP address at a time, starting from the left, and use those to determine the quadrant, then continue with that quadrant instead of the whole square." So if I am reading you correctly, 255 would be in the upper right corner of a Hilbert curve that goes 0-3-2-1 (starting upper left, moving clockwise) and in the lower left corner of a Hilbert curve that goes 0-1-2-3 (starting upper left, moving clockwise)? That is not the way the internet map in the xkcd comic goes. Can you clarify your algorithm?
hughdbrown
The 0-3-2-1 curve corresponds to the two highest bits of the address and is the highest level view. The 0-1-2-3 curve is zoomed in to the 0-square of this view, and corresponds to the next two bits. This is how the xkcd map goes: for each 2 bits, you zoom in to the correct square and determine how the curve is oriented at that level.
jk
+2  A: 

I wrote an article about exactly this, with code to do the conversions, here.

Nick Johnson
Fascinating reading, it would get my "answer" tick except that of course your code only goes the one way (the wrong one) ;)
Martin
You can invert the transform fairly trivially - it's an exercise for the reader. ;) Nevertheless, if you want to do something to every IPv4 address, you probably want to iterate in address order anyway.
Nick Johnson
+1  A: 

I have done Nick's exercise. You can find it here: http://hilbert-curve.sourceforge.net.

epitaph