views:

136

answers:

4

I have n=10000 10-dimensional vectors. For every vector v1 I want to know the vector v2 that minimizes the angle between v1 and v2.

Is there a way to solve this problem faster than O(n^2)?

A: 

How about you calculate angles for each vector ( O(n)) then sort the array based on angles ( O(nlogn) ) and then just walk through the array ( O(n) )the closest vector is either i+1 or i-1.

Edit: As pointed out in comments, this works only in 2 dimensions.

Kugel
Does this work for more than 2 dimensions?
WhirlWind
Not at all; in >2 dimensions the equivalent is to replace things by unit vectors, which lowers the dimensionality by one but still leaves it too high to be nicely ordered.
Tom Womack
+2  A: 

What you've got is essentially the closest-points-on-a-sphere problem (since the angle between vectors is roughly the distance between the points on the sphere that their unit-vectors lie on), so you could probably do some sort of binary (maybe ternary would be easier to avoid too many border issues) space partitioning decomposition.

But this would be unpleasant to code, and probably not much faster than the naive method for as few as 10,000 points (in particular, you start off by dividing the points among 3^10 = 59049 boxes, though most of those would be empty). A hundred million ten-element dot products ought to be well under one second.

Tom Womack
+2  A: 

You can normalize all the vectors in O(n) time and find a parameterization of them onto the resulting 9-dimensional hypersphere. You can then use a spatial search structure in the n-1 dimensional space, like a Kd-tree to speed up the nearest neighbor query. There are well known methods (ANN) to do this.

Victor Liu
+1  A: 

Wow. Ten-dimensional vectors? What do you do with those?

I guess I would start with reducing each vector to unit length -- i.e., to points on a hypersphere -- then use some "nearest neighbor search" (NNS) algorithm such as kD tree (k-Dimensional binary tree), R-tree, Best Bin First, etc.

It's probably impossible to solve this problem faster than O(n log n), because if you could solve this problem faster than that, you could solve the simpler "closest pair of points problem" faster than the current lower bound of O(n log n).

As Tom Womack, pointed out, the brute-force O(n^2) algorithm will take less actual wall clock time than these more sophisticated algorithms for "small" amounts of data, and it looks like "n=10000" is relatively small for 10 dimensions.

David Cary