Hello. I have a data frame
database$VAR
which has values of 0's and 1's.
How can I redefine the data frame so that the 1's are removed?
Thanks!
Hello. I have a data frame
database$VAR
which has values of 0's and 1's.
How can I redefine the data frame so that the 1's are removed?
Thanks!
Try this:
R> df <- data.frame(VAR = c(0,1,0,1,1))
R> df[ -which(df[,"VAR"]==1), , drop=FALSE]
VAR
1 0
3 0
R>
We use which( booleanExpr )
to get the indices for which your condition holds, then use -1 on these to exclude them and lastly use a drop=FALSE
to prevent our data.frame
of one columns from collapsing into a vector.
TMTOWTDI
Using subset
:
df.new <- subset(df, VAR == 0)
EDIT:
David's solution seems to be the fastest on my machine. Subset seems to be the slowest. I won't even pretend to try and understand what's going on under that accounts for these differences:
> df <- data.frame(y=rep(c(1,0), times=1000000))
>
> system.time(df[ -which(df[,"y"]==1), , drop=FALSE])
user system elapsed
0.16 0.05 0.23
> system.time(df[which(df$y == 0), ])
user system elapsed
0.03 0.01 0.06
> system.time(subset(df, y == 0))
user system elapsed
0.14 0.09 0.27
I'd upvote the answer using "subset" if I had the reputation for it :-) . You can also use a logical vector directly for subsetting -- no need for "which":
d <- data.frame(VAR = c(0,1,0,1,1))
d[d$VAR == 0, , drop=FALSE]
I'm surprised to find the logical version a little faster in at least one case. (I expected the "which" version might win due to R possibly preallocating the proper amount of storage for the result.)
> d <- data.frame(y=rep(c(1,0), times=1000000))
> system.time(d[which(d$y == 0), ])
user system elapsed
0.119 0.067 0.188
> system.time(d[d$y == 0, ])
user system elapsed
0.049 0.024 0.074