views:

417

answers:

3

Given 2N-points in a 2D-plane, you have to group them into N pairs such that the overall sum of distances between the points of all of the pairs is the minimum possible value.The desired output is only the sum.

In other words, if a1,a2,..an are the distances between points of first, second...and nth pair respectively, then (a1+a2+...an) should be minimum.

Let us consider this test-case, if the 2*5 points are : {20,20}, {40, 20}, {10, 10}, {2, 2}, {240, 6}, {12, 12}, {100, 120}, {6, 48}, {12, 18}, {0, 0}

The desired output is 237.

This is not my homework,I am inquisitive about different approaches rather than brute-force.

+5  A: 
Moron
Maybe not quite ... the task here is geometric, not graph-based. You have more freedom without that restriction that all things must be packed into a graph.
Hamish Grubijan
Could you please explain with some test-cases ?
Pie-Guy
Me? Well, 2N points in 2d plane is the same as having a complete 2N graph, and that is going to eat up a lot of memory!
Hamish Grubijan
@Hamish, why not?
Moron
@ Hamish Grubijan : No,my question was for Moron.Thanks :)
Pie-Guy
@Moron: that article is hard stuff. The OP is looking for a solution for the special case where G is a complete planar graph. Do you know of any simpler algorithms for this special case?
Doc Brown
Perhaps the so called duality approach can help identify closest pairs sooner. Note that this is computational geometry stuff, not pure algorithms.
Hamish Grubijan
@Moron why not? Not sure what which part you refer to, but basically an entire field of computational geometry emerged precisely because there are some things that you can do better when you do not restrict yourself to a graph data structure. There is still a very intimate relationship, but graph-only approaches (where every point is a vertex and every line segment is an edge) sometimes aren't as good.
Hamish Grubijan
@Doc. Not off the top of my head. I would presume the triangle inequality restriction would be more important than planarity (which it is not, btw). I will update this later when I find something.
Moron
@Doc: I have edited my post to add a link to a (hopefully) faster algorithm. That paper does not seem to have enough details, though.
Moron
@moron Very interesting, but could you please outline the presented algorithm? -- that link seems more of an abstract than an answer to the question. Otherwise I'd no longer wonder why jet physics people use greedy algorithms (maybe with kernels) if CS can only be done in such opaque lingo.
honk
@honk: Perfect Matching in graphs is a well known problem, you should find plenty of literature on that. As to the link(second one) I gave, I would have to read it myself and try to summarize it. It might be hard to do that though, given the prerequisites needed for even understanding that paper. I might make an attempt, but please don't count on it to happen soon.
Moron
@Moron: I am not much familiar to Minimum weight perfect matching, what I found from goggling doesn't give basic knowledge to implement for my problem.Could you help me in this regard ?
Pie-Guy
@Moron: when I wrote 'simpler', I meant 'simpler to understand', not 'faster' :-)
Doc Brown
thanks for caring, about accessibility ;)
honk
@Moron: wow, that summary is excellent and IMHO much better to understand than the publication.
Doc Brown
A: 

Your final sum will mostly be dominated by the largest addend. The simplest algorithm to exploit this could go like this (I cannot prove this):

  1. sort points descending by their nearest-neighbor distance
  2. form pair of first entry and its nearest neighbor
  3. remove pair from list
  4. if list not empty goto 1.

This should work very often.

Since you are essentially looking for a clustering algorithm for clusters of 2 this link or a search for clustering algorithms for jet reconstruction might be interesting. People in the experimental particle physics community are working on heuristic algorithms for problems like this.

honk
-1. As I wrote to Hamish, a greedy algo like yours underestimates the difficulty of the problem. And having a short look to your links, they seem to point in a completely wrong direction.
Doc Brown
@Doc I didn't say this would always work, and btw Harmish's algorithm started with the closest pair. Maybe you have something in mind that always works (I would be interested)
honk
@honk: I think (besides the obvious 'brute force' solution) the links given by Moron contain valid algorithms. They are just hard to read for non-mathematicians and probably not easily to implement.
Doc Brown
@Doc Agree. You or moron should really write up a schematic implementation of the algorithm for this 'simple' case though so we all can understand it. Looking at google results you could easily become the Number 1 hit for this topic ;)
honk
How do you "sort points by their nearest neighbour distance"? The pairwise nearest neighbour distance doesn't provide a total order for points.
Nick Johnson
@nick Agreed, for gridded points this might get it totally wrong. If the points are not on a grid though the difference between half and total order shouldn't matter too much *I'd guess*.I assume that there is a "loner point" dominating the sum (which has not to hold at all). If the points are too evenly distributed the sum I outlined might easily be dominated not by the first addend, but by the *last*.But for a not too small number of not too evenly distributed points this should still give a not too bad guess.Should I remove answer this answer since it depends on too many *not too's*?
honk
No, my point is that you simply can't sort a list by a property that pairs of elements have unless that property provides a total order - and the euclidian distance between elements isn't that. Perhaps you meant something else by "sort points descending by their nearest-neighbor distance"?
Nick Johnson
@nick sure you can: for every point find it's NN, measure the distance, that's the number *this* point gets, sort
honk
A: 

After googling a while, I found some other references to minimum weight perfect matching algorithms which might be easier to understand (at least, easier to a certain degree).

EDIT

Here I found a python implementation of one of those algorithms. It has 837 lines of code (+ some additional unit tests), and I did not try it out by myself. But perhaps you can use it for your case.

Here is a link to an approximation algorithm for the problem. Of course, the style of the paper is mathematical, too, but IMHO much easier to understand than the paper of Cook and Rohe. And it states in its preface that it aims exactly for the purpose to be easier to implement than Edmond's original algorithm, since you don't need a linear programming solver.

EDIT2:

After thinking a while about the problem, IMHO it must be possible to set up an A* search for solving this problem. The search space here is the set of 'partially matchings' (or partially paired point sets). As Moron already wrote in his comments, one can restrict the search to the situation where no pairs with crossing connecting lines exist. The path-cost function (to use the terms from Wikipedia) is the sum of the distances of the already paired points. The heuristic function h(x) can be any under-estimation for the remaining distances, for example, when you have 2M points not paired so far, take the sum of the M minimal distances between all of those remaining points.

That one will probably be not as efficient as the algorithm Moron pointed to, but I suspect it will be much better than 'brute force', and much easier to implement.

Doc Brown
The first link you gave has approximation algorithms.
Moron
Ups, you are right, I overlooked that at the first glance. I will edit my answer appropriately.
Doc Brown