views:

770

answers:

5

I have a huge set of N-dimensional points (tens of millions; N is close to 100).

I need to map these points to a single dimension while preserving spatial locality. I want to use Hilbert space-filling curve to do it.

For each point I want to pick the closest point on the curve. The Hilbert value of the point (curve length from the start of curve to the picked point) is the single dimension value I seek.

Computation does not have to be instant, but I expect it to be no more than several hours on decent modern home PC hardware.

Any suggestions on implementation? Are there any libraries that would help me? (Language does not matter much.)

+2  A: 

I don't see how you can use a Hilbert curve in one dimension.

If you are interested in mapping points to a lower dimension while preserving distances (with minimum error) then you can look into "Multidimensional Scaling" algorithms.

Simulated annealing is one approach.

Edit: Thanks for the comment. I see what you meant by the Hilbert Curve approach now. However, this is a hard problem, and given N=100 and 10 million data points I don't think any approach will preserve locality well and run in a reasonable amount of time. I don't think kd-trees will work here.

If finding a total ordering is not important to you, then you can look into locality-based hashing and other approximate nearest neighbor schemes. Hierarchical multidimensional scaling with buckets of points to reduce the input size might give you a good ordering, but again it's doubtful in such a high dimension.

Imran
I intend to use Hilbert curve in N-dimensional space, not in one dimension.
Alexander Gladysh
+1  A: 

Another possibility would be to build a kd-tree on your data, and then to an in-order traversal of the tree to get the ordering. Constructing the kd-tree only requires you to have a good median-finding algorithm, of which there are many.

Adam Rosenfield
+3  A: 

It's not clear to me how this will do what you want. Consider this trival 3D case:

001 ------ 101
 |\         |\
 | \        | \
 |  011 ------ 111
 |   |      |   |
 |   |      |   |
000 -|---- 100  |
  \  |       \  |
   \ |        \ |
    010 ------ 110

which can be "Hilbertized" by the following path:

001 -----> 101
  \          \
   \          \
    011        111
     ^          |
     |          |
000  |     100  |
  \  |       \  |
   \ |        \ V
    010        110

into the 1D order:

000 -> 010 -> 011 -> 001 -> 101 -> 111 -> 110 -> 100

Here's the nasty bit. Consider the list of pairs and 1D distances below:

000 : 100 -> 7
010 : 110 -> 5
011 : 111 -> 3
001 : 101 -> 1

In all cases, the left- and right-hand values are the same 3D distance from each other (+/- 1 in the first position), which seem to imply similar "spatial locality". But linearizing by any choice of dimensional ordering (y, then z, then z, in the above example) breaks that locality.

Another way of saying this is that taking a starting point and ordering the remaining points by their distance from that starting point will provide significantly different results. Taking 000 as the start, for example:

1D ordering : distance    3D ordering : distance
----------------------    ----------------------
        010 : 1           001,010,100 : 1
                          011,101,110 : sqrt(2)
                              111     : sqrt(3)
        011 : 2
        001 : 3
        101 : 4
        111 : 5
        110 : 6
        100 : 7

This effect grows exponentially with the number of dimensions (assuming that each dimension has the same "size").

joel.neely
Well, if I get it right, locality is still preserved -- all points within the cube would belong to edges of that cube and nowhere else. The less cube volume is, the better locality is preserved. I think results should be close enough for me.
Alexander Gladysh
See also article "Analysis of the Clustering Properties of the Hilbert Space-Filling Curve": http://www.cs.cmu.edu/afs/cs.cmu.edu/user/christos/www/PUBLICATIONS/ieee-tkde-hilbert.ps.gz
Alexander Gladysh
Yes but as your "cubes" get smaller, the length of your Hilbert curve increases exponentially. I suspect that it will be very long before you get a good ordering on your points, because of complexity of the problem you are trying to solve.
Imran
@Alexander: The size of the cube isn't the issue. The orthogonal neighbors of a point still end up having radically different 1D distances. And my example was truly trivial. It's not the general case that all points are on the edge of the cube.
joel.neely
+2  A: 

Algorithm for mapping from n->1 and 1->n given here http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.37.4165

If you Google for "SFC module and Kademlia overlay" youl find a group who claim to use it as part of their system. If you view the source you can probably extract the relevant function.

If you get it working would you be so good as to post a link to the code here.

+1  A: 

I have written a 2D transformation of the hilbert curve. I will add a 3D transformation soon:

http://hilbert-curve.sourceforge.net

epitaph