Hi everyone,
I have toyed with a number of ideas to do this, but so far have only come up with some rather inelegant solutions. I'm sure I could make it work, but the code would neither be pretty nor efficient. Here's the problem:
I have a series of integer pairs that are presented as rows in a two-column data frame. The goal is three-fold:
You need to "eliminate" all the rows in this data frame. To "eliminate" a row, you must select either one of the units from that pair and send/save it to a vector of "selected" elements.
You must find the smallest possible combination of "selected elements" that will eliminate all the pairs in the data frame.
The code must be computationally efficient because it will be applied to rather large datasets.
For instance, one would choose items "1" and "2" from the following list of pairs:
1 3
1 4
2 5
3 2
The data below can be used as a working example.
Thanks!
Vincent
Update for some context:
Hi Cipi and SiggyF.
I understand your concerns about this being homework, so in case you read this again, here's some context that I hope may dispell your doubts.
I am working with time-series cross sectional data in which N is much larger than T. I would like to use panel-corrected standard errors like those proposed in Beck & Katz (1995). The packages "pcse" is mostly able to do this just fine. When you have an unbalanced panel, it essentially creates a "rectangular" dataset (every time units has the full amount of observations) by filling in missing values for the omitted observations in every panel. Then, pcse computes a matrix Sigma.hat which is essentially the weighted average of the outer product of the residuals within time periods (think of it as averaging over an N X N X T array to bring it down to a N X N Sigma.hat).
The problem is that if any two units have zero contemporaneous observation, then the corresponding cell in Sigma.hat will be NA, and pcse won't be able to use it to get the sandwich estimator of the variance covariance matrix. In my example, the data frame numbers correspond to the index of the missing values in Sigma.hat. I want to trim down Sigma.hat automatically, to get an estimate of the VCOV that uses the most information possible, hence my desire to keep as many of the numbers in the data frame.
This is probably very unclear to anyone who hasn't looked into pcse, but I hope you get the gist of it.
Sorry to have given an impression of impropriety, but I assure you, this is legit.
test<-structure(list(row = c(27L, 44L, 45L, 111L, 128L, 129L, 195L, 212L, 213L, 279L, 296L, 297L, 363L, 380L, 381L, 7L, 91L, 175L, 259L, 343L, 44L, 45L, 70L, 128L, 129L, 154L, 212L, 213L, 238L, 296L, 297L, 322L, 380L, 381L, 406L, 7L, 37L, 48L, 91L, 121L, 132L, 175L, 205L, 216L, 259L, 289L, 300L, 343L, 373L, 384L, 7L, 37L, 48L, 91L, 121L, 132L, 175L, 205L, 216L, 259L, 289L, 300L, 343L, 373L, 384L, 44L, 45L, 128L, 129L, 212L, 213L, 296L, 297L, 380L, 381L, 37L, 121L, 205L, 289L, 373L, 27L, 44L, 45L, 111L, 128L, 129L, 195L, 212L, 213L, 279L, 296L, 297L, 363L, 380L, 381L, 7L, 91L, 175L, 259L, 343L, 44L, 45L, 70L, 128L, 129L, 154L, 212L, 213L, 238L, 296L, 297L, 322L, 380L, 381L, 406L, 7L, 37L, 48L, 91L, 121L, 132L, 175L, 205L, 216L, 259L, 289L, 300L, 343L, 373L, 384L, 7L, 37L, 48L, 91L, 121L, 132L, 175L, 205L, 216L, 259L, 289L, 300L, 343L, 373L, 384L, 44L, 45L, 128L, 129L, 212L, 213L, 296L, 297L, 380L, 381L, 37L, 121L, 205L, 289L, 373L, 27L, 44L, 45L, 111L, 128L, 129L, 195L, 212L, 213L, 279L, 296L, 297L, 363L, 380L, 381L, 7L, 91L, 175L, 259L, 343L, 44L, 45L, 70L, 128L, 129L, 154L, 212L, 213L, 238L, 296L, 297L, 322L, 380L, 381L, 406L, 7L, 37L, 48L, 91L, 121L, 132L, 175L, 205L, 216L, 259L, 289L, 300L, 343L, 373L, 384L, 7L, 37L, 48L, 91L, 121L, 132L, 175L, 205L, 216L, 259L, 289L, 300L, 343L, 373L, 384L, 44L, 45L, 128L, 129L, 212L, 213L, 296L, 297L, 380L, 381L, 37L, 121L, 205L, 289L, 373L, 27L, 44L, 45L, 111L, 128L, 129L, 195L, 212L, 213L, 279L, 296L, 297L, 363L, 380L, 381L, 7L, 91L, 175L, 259L, 343L, 44L, 45L, 70L, 128L, 129L, 154L, 212L, 213L, 238L, 296L, 297L, 322L, 380L, 381L, 406L, 7L, 37L, 48L, 91L, 121L, 132L, 175L, 205L, 216L, 259L, 289L, 300L, 343L, 373L, 384L, 7L, 37L, 48L, 91L, 121L, 132L, 175L, 205L, 216L, 259L, 289L, 300L, 343L, 373L, 384L, 44L, 45L, 128L, 129L, 212L, 213L, 296L, 297L, 380L, 381L, 37L, 121L, 205L, 289L, 373L, 27L, 44L, 45L, 111L, 128L, 129L, 195L, 212L, 213L, 279L, 296L, 297L, 363L, 380L, 381L, 7L, 91L, 175L, 259L, 343L, 44L, 45L, 70L, 128L, 129L, 154L, 212L, 213L, 238L, 296L, 297L, 322L, 380L, 381L, 406L, 7L, 37L, 48L, 91L, 121L, 132L, 175L, 205L, 216L, 259L, 289L, 300L, 343L, 373L, 384L, 7L, 37L, 48L, 91L, 121L, 132L, 175L, 205L, 216L, 259L, 289L, 300L, 343L, 373L, 384L, 44L, 45L, 128L, 129L, 212L, 213L, 296L, 297L, 380L, 381L, 37L, 121L, 205L, 289L, 373L), col = c(7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 27L, 27L, 27L, 27L, 27L, 37L, 37L, 37L, 37L, 37L, 37L, 37L, 37L, 37L, 37L, 37L, 37L, 37L, 37L, 37L, 44L, 44L, 44L, 44L, 44L, 44L, 44L, 44L, 44L, 44L, 44L, 44L, 44L, 44L, 44L, 45L, 45L, 45L, 45L, 45L, 45L, 45L, 45L, 45L, 45L, 45L, 45L, 45L, 45L, 45L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 48L, 70L, 70L, 70L, 70L, 70L, 91L, 91L, 91L, 91L, 91L, 91L, 91L, 91L, 91L, 91L, 91L, 91L, 91L, 91L, 91L, 111L, 111L, 111L, 111L, 111L, 121L, 121L, 121L, 121L, 121L, 121L, 121L, 121L, 121L, 121L, 121L, 121L, 121L, 121L, 121L, 128L, 128L, 128L, 128L, 128L, 128L, 128L, 128L, 128L, 128L, 128L, 128L, 128L, 128L, 128L, 129L, 129L, 129L, 129L, 129L, 129L, 129L, 129L, 129L, 129L, 129L, 129L, 129L, 129L, 129L, 132L, 132L, 132L, 132L, 132L, 132L, 132L, 132L, 132L, 132L, 154L, 154L, 154L, 154L, 154L, 175L, 175L, 175L, 175L, 175L, 175L, 175L, 175L, 175L, 175L, 175L, 175L, 175L, 175L, 175L, 195L, 195L, 195L, 195L, 195L, 205L, 205L, 205L, 205L, 205L, 205L, 205L, 205L, 205L, 205L, 205L, 205L, 205L, 205L, 205L, 212L, 212L, 212L, 212L, 212L, 212L, 212L, 212L, 212L, 212L, 212L, 212L, 212L, 212L, 212L, 213L, 213L, 213L, 213L, 213L, 213L, 213L, 213L, 213L, 213L, 213L, 213L, 213L, 213L, 213L, 216L, 216L, 216L, 216L, 216L, 216L, 216L, 216L, 216L, 216L, 238L, 238L, 238L, 238L, 238L, 259L, 259L, 259L, 259L, 259L, 259L, 259L, 259L, 259L, 259L, 259L, 259L, 259L, 259L, 259L, 279L, 279L, 279L, 279L, 279L, 289L, 289L, 289L, 289L, 289L, 289L, 289L, 289L, 289L, 289L, 289L, 289L, 289L, 289L, 289L, 296L, 296L, 296L, 296L, 296L, 296L, 296L, 296L, 296L, 296L, 296L, 296L, 296L, 296L, 296L, 297L, 297L, 297L, 297L, 297L, 297L, 297L, 297L, 297L, 297L, 297L, 297L, 297L, 297L, 297L, 300L, 300L, 300L, 300L, 300L, 300L, 300L, 300L, 300L, 300L, 322L, 322L, 322L, 322L, 322L, 343L, 343L, 343L, 343L, 343L, 343L, 343L, 343L, 343L, 343L, 343L, 343L, 343L, 343L, 343L, 363L, 363L, 363L, 363L, 363L, 373L, 373L, 373L, 373L, 373L, 373L, 373L, 373L, 373L, 373L, 373L, 373L, 373L, 373L, 373L, 380L, 380L, 380L, 380L, 380L, 380L, 380L, 380L, 380L, 380L, 380L, 380L, 380L, 380L, 380L, 381L, 381L, 381L, 381L, 381L, 381L, 381L, 381L, 381L, 381L, 381L, 381L, 381L, 381L, 381L, 384L, 384L, 384L, 384L, 384L, 384L, 384L, 384L, 384L, 384L, 406L, 406L, 406L, 406L, 406L)), .Names = c("row", "col" ), row.names = c(NA, -400L), class = "data.frame")