ansaurus

Question

Answer 1

+3 A:

Ok, if you consider your elements as vertices, and your pairs as edges of a graph, and your problem becomes a case of the well known (and NP complete) vertex cover problem. You can easily find an approximate solution, guaranteed to be within a factor of two of optimal by choosing an arbitrary edge, and selecting both vertices, removing all eliminated edges, lather, rinse, repeat. You can do incrementally better with more complicated approximation algorithms, but finding the optimal solution with a large graph is probably not feasible.

Here is a simple function to do this. (Note R is not my native language, so this is probably hideously non idomatic, any suggestions for improvement would be appreciated).

good <- function(dat, result = NULL) {
 sampr <- dat[sample(1:(dim(dat)[1]),1),]
 if (dim(dat)[1] == 0){
    result
  } else {
    good(subset(dat, row != sampr$row & row != sampr$col & col != sampr$row & 
                     col != sampr$col),result = c(result, sampr$row, sampr$col))
  }
}

I'd run this a number of times and keep the best one. (It might also be useful to keep track of the size of the worst one, as it gives you a lower bound on the optimal size). It might be useful to postprocess the result to remove excess vertices.

Running 10000 iterations (and removing redundant vertices) gives the following 19 element solution to your sample problem.

7 37 45 48 91 121 128 132 175 205 212 216 259 279 289 300 343 373 384

We also know that the optimal solution must have at least 15 vertices.

deinst 2010-07-25 19:25:34

Thanks for the info about vertex cover. I would not have thought of it this way. Knowing that I probably won't be able to reliably find a perfectly optimal answer in large datasets is very useful. Your solution is very close to what I was going to implement. However, I was thinking about doing a "guided" search instead of a purely random one. For instance, the loop could start with numbers that appear most often in the data frame. That would probably save me some computation time and would likely get me an OK answer. Thanks a lot for your input deinst. I really appreciate it.

Vincent 2010-07-26 12:41:59

No problem. There are more complicated approximation methods using linear programming ideas.

deinst 2010-07-26 12:51:58

Cool!Taking the simpler approach of eliminating the elements that appear most frequently in this data frame, I get a 20 element solution: 7 37 44 45 91 121 128 129 175 205 212 213 259 289 296 297 343 373 380 381Pretty close for such a simple rule. Although I do realize the difference might be larger in bigger samples (but the computation time for so many iterations would also be much larger...).

Vincent 2010-07-27 03:28:37

ansaurus

tags:

views:

answers:

Simple R puzzle (elimination of pairs)

related questions