tags:

views:

264

answers:

3

That's x \ y using mathematical notation. Suppose

x <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,2,1,1,1,3) 
y <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1)

How can I get a vector with ALL the values in x that are not in y. i.e the result should be:

2,1,1,3

There is a similar question here. However, none of the answers returns the result that I want.

A: 

How about this:

R> x[x!=y]
[1] 2 1 1 1 3
Warning message:
In x != y : longer object length is not a multiple of shorter object length
R>

This is difficult problem, I think, as you are mixing values and positions. The easier solution relies on one of the 'set' functions in R:

R> setdiff(x,y)
[1] 2 3

but that uses only values and not position.

The problem with the answer I gave you is the implicit use of recycling and the warning it triggered: as your x is longer than your y, the first few values of y get reused. But recycling is considered "clean" on when the longer vector has an integer-multiple length of the length of the shorter vector. But that is not the case here, and hence I am not sure we can solve your problem all that cleanly.

Dirk Eddelbuettel
and setdiff(x, y) is, indeed, the standard definition for x \ y ... since it's a set operation, it first finds unique values and then compares the two vectors.
William Doane
+3  A: 

If I understand the problem, you can use table to compute the difference in the number of elements in each set and then create a vector based on the difference of those counts (note that this won't necessarily give you the order you gave in your question).

> diffs <- table(x) - table(factor(y, levels=levels(factor(x))))
> rep(as.numeric(names(diffs)), ifelse(diffs < 0, 0, diffs))
[1] 1 1 2 3
Jonathan Chang
Thanks, this one is working indeed!. Order does not matter. I am just curious to see how can this be achieved using library `sets` (using a function like `set_complement` perhaps) or another "one liner". I can hardly believe there's no way to get this directly.
gd047
There would be if you were working with true sets. Sets don't have duplicates and order doesn't matter. All the set functions in R are going to follow from that definition... what you actually have is the set X whose elements are {0, 1, 2, 3} and the set Y whose elements are {0, 1}. Thus X \ Y is {2, 3}. While what you're looking for for output is well defined, it's NOT a set operation, so you're going to need to do a little work to get it. You can always wrap Jonathan's code in a function, if you must have a one-line solution.
William Doane
If sets don't accept duplicates, there are also multisets that do.
gd047
+4  A: 

Here a solution using pmatch (this gives the "complement" as you require):

x <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,2,1,1,1,3)
y <- c(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1)
res <- x[is.na(pmatch(x,y))]

From pmatch documentation:

"If duplicates.ok is FALSE, values of table once matched are excluded from the search for subsequent matches."

teucer