I have a dataset of approximately 100,000 (X, Y) pairs representing points in 2D space. For each point, I want to find its k-nearest neighbors.
So, my question is - what data-structure / algorithm would be a suitable choice, assuming I want to absolutely minimise the overall running time?
I'm not looking for code - just a pointer towards a suitable approach. I'm a bit daunted by the range of choices that seem relevent - quad-trees, R-trees, kd-trees, etc.
I'm thinking the best approach is to build a data structure, then run some kind of k-Nearest Neighbor search for each point. However, since (a) I know the points in advance, and (b) I know I must run the search for every point exactly once, perhaps there is a better approach?
Some extra details:
- Since I want to minimise the entire running time, I don't care if the majority of time is spent on structure vs search.
- The (X, Y) pairs are fairly well spread out, so we can assume an almost uniform distribution.