views:

775

answers:

6

I'm writing a program that implements SCVT (Spherical Centroidal Voronoi Tesselation). I start with a set of points distributed over the unit sphere (I have an option for random points or an equal-area spiral). There will be from a several hundred to maybe 64K points.

I then need to produce probably several million random sample points, for each sample find the nearest point in the set, and use that to calculate a "weight" for that point. (This weigh may have to be looked up from another spherical set, but that set will stay static for any given run of the algorithm.)

Then I move the original points to the calculated points, and iterate the process, probably 10 or 20 times. This will give me the centers of the Voronoi tiles for subsequent use.

Later I will need to find a given point's nearest neighbor, to see what tile the user clicked on. This is trivially solved within the above problem, and doesn't need to be super-fast anyway. The part I need to be efficient is all those millions of nearest neighbors on the unit sphere. Any pointers?

Oh, I'm using x, y, z coordinates, but that's not set in stone. It just looks like it will simplify things. I'm also using C as I'm most familiar with it, but not wedded to that choice either. :)

I've considered using the spiral pattern for the sample points, as that gives me at least the last point's found neighbor as a good starting point for the next search. But if I do that, it looks like it would make any sort of tree search useless.

edit: [I'm sorry, I thought I was clear with the title and tags. I can generate random points easily. The issue is the nearest neighbor search. What's an efficient algorithm when all the points are on the unit sphere?]

A: 

Here is the article on neighbor search: http://en.wikipedia.org/wiki/Nearest_neighbor_search In my understanding you can use trivial algorithm of going through all Voronoi centers and calculate 3d distance between your point and center point.

distance_2 = (x - x_0)^2 + (y - y_0)^2 + (z - z_0)^2

where (x_0, y_0, z_0) is the point of interest (click) for you and {(x, y, z)} are Voronoi centers. The smallest distance will give you the nearest center.

Artem
This is basically the worst-case algorithm, O(n), requiring you to scan through all the points for each test point. For the user interface code, this would probably be acceptable. For the many millions of sampling points, an O(logn) method (or less) would be much preferred.
Jerry B
@Jerry B, still it could be worse. O(n) isn't bad.
Simucal
Well, that's O(n) for each search, making the whole algorithm O(n*m). That's approaching a trillion comparisons. Since scanning the whole list is the "naive" method, I don't see how it could be done worse without doing so on purpose.
Jerry B
Then I probably got you wrong. I thought that you had 64K points, after that selected some small amount of Voronoi centers (I assumed less than 100) and then for only one point (mouse click) you need to find a nearest group, i.e. nearest Voronoi center.
Artem
if your points are in a relational database, just add a where clause to limit the points selected to only those roughly close enough to be considered for the O(n) calculation, as proposed. I was looking for similar colours in a L*a*b* sphere like this and it was really fast.
Peter Perháč
Jerry B
MasterPeter: So far, the points are just in an array in memory. For *illions of samples against up to 64K points, that's still going to take a lot of time. This is to generate a map for a game, and I don't think people will want to wait minutes, let alone hours, to start playing. :)
Jerry B
A: 

Using a KD Trie is a good way to speed up the search. You can also get significantly better performance if you can tolerate some error. The ANN library will give you the result within an ε of your choosing.

Don Reba
No, ANN wouldn't work. To calculate the centroid with any accuracy requires exact NN. Does a KD Trie handle the points being on the unit sphere well? The Wiki page on KD Tries mentions ANN, but not NN.I may have to do an ANN, followed by a search of the nearby points for the exact NN.
Jerry B
The ε can be zero, if necessary. Although, then there is a wider choice of implementations available, and the ANN library is not necessarily the best one. It has the huge drawback of not being thread-safe.
Don Reba
A: 

OK. NEARPT3 http://www.ecse.rpi.edu/Homepages/wrf/Research/nearpt3/nearpt3.pdf algorithm could be helpful in your case. And it all depends on how many space you can afford to use for your N points. If it is O(N*logN) then there are algorithms like kD-tree based (http://www.inf.ed.ac.uk/teaching/courses/inf2b/learnnotes/inf2b-learn06-lec.pdf) which would work for O(logN) to find nearest point. In case of 64K point Nlog_2 N = about 10^6 which is easily can fit into memory of modern computer.

Artem
Some good reading there. One thing that struck me in the lecture notes was the comment that time tends to go exponentially by dimension. Thus, the kD-tree in 3 dimensions seems wasteful for what is essentially 2 dimensional data. Otherwise, it sounds nice.
Jerry B
Right. However, you want to be careful, because the nearest neighbour in spherical coordinates is not necesserily the nearest neighbour in Cartesian.
Don Reba
A: 

You may find that organising your points into a data structure called an Octree is useful for efficient search for nearby points. See http://en.wikipedia.org/wiki/Octree

David Plumpton
+1  A: 

Your points are uniformly distributed over the sphere. Therefore, it would make a lot of sense to convert them to spherical coordinates and discretize. Searching the 2D grid first would narrow down the choice of nearest neighbour to a small part of the sphere in constant time.

Don Reba
I've been struck by an idea to reduce it almost another dimension. If I arrange the points along a curve spiraling at constant distance from the previous winding, the search can be nearly linear. Probably only have to check 2 short ranges on the line...
Jerry B
A: 

I have devised a curve (I'm sure I'm not the 1st) that spirals along the sphere from pole to pole. It remains a constant distance from neighboring windings (if I did it right). For z [-1 at south pole to +1 at north pole]:

n= a constant defining a given spiral k=sqrt(n*pi)

r=sqrt(z^2) theta=k*asin(z) x=r*cos(theta) y=r*sin(theta)

It makes k/2 revolutions around the sphere, with each winding sqrt(4pi/n) from adjacent windings, while the slope dz/d(x,y) is 1/k.

Anyway, set k such that the inter-winding distance covers the largest tile on the sphere. For every point in the main set, calculate the theta of the nearest point on the curve, and index the list of points by those numbers. For a given test point, calculate it's (theta of the nearest point on the curve), and find that in the index. Search outward (in both directions) from there, to theta values that are as far away as your current nearest neighbor. After reaching that limit, if the distance to that neighbor is less than the distance from the test point to the next adjacent winding, you've found the nearest neighbor. If not, jump the theta value by 2pi and search that winding the same way.

Critique?

Jerry B