tags:

views:

128

answers:

1

I am attempting to find the elements that are not common across multiple vectors. That is, I want to know exactly the elements (not just their position, etc.) that are not shared across all vectors.

The best implementation I could come up with uses a nested-loop, which I realize is probably the least efficient, most notably because the execution is still running as I write this. Here is what I came up with. (each *.id is a vector of Supreme Court case ID's stored as strings).

check.cases<-TRUE

if(check.cases) {
    all.cases<-c(AMKennedy.id,AScalia.id,CThomas.id,DHSouter.id,JGRoberts.id,JPStevens.id,RBGinsburg.id,SAAlito.id,SGBreyer.id)
    bad.cases<-c()
    for(b in all.cases) {
        for(t in all.cases) {
            m<-match(t,b)
            bad<-t[which(is.na(m))]
            bad.cases<-append(bad.cases,bad)
        }
    }
    bad.cases<-unique(bad.cases)
}

print(bad.cases)

There must be a more efficient way of doing this?

+3  A: 

Trying to find cases where all the Supreme Court justices weren't involved? Don't suppose that you have a small sample dataset that you could add?

A thought: rbind the vectors on top of each other so that you have a dataset like data.frame("justice","case"). Then use hadley's reshape package (use the cast function) to sum the number of justices per case. Any case with less than the total number of justices will be a "bad case".

Shane
...or just combine them all into one vector (say with `unlist`) and do a count with `table`.
Jonathan Chang