ansaurus

Question

R: manipulating data.frames containing strings and booleans.

Answer 1

+4 A:

Here's some sample data:

df <- data.frame(a=c(FALSE, TRUE, FALSE), b=c(TRUE, FALSE, FALSE), c=c(FALSE, FALSE, TRUE))

You can use apply to do something like this:

names(df)[apply(df, 1, which)]

Or without apply by using which directly:

idx <- which(as.matrix(df), arr.ind=T)
names(df)[idx[order(idx[,1]),"col"]]

Shane 2010-04-21 16:33:42

I'm getting old. You beat me by five minutes ;-)

Dirk Eddelbuettel 2010-04-21 16:40:53

see comment under Dirk's solution! The second approach doesn't give the same response as the first..

Mike Dewar 2010-04-21 17:19:39

I corrected that.

Shane 2010-04-21 17:25:27

Answer 2

+3 A:

Use apply to sweep your index through, and use that index to access the column names:

> df <- data.frame(a=c(TRUE,FALSE,FALSE),b=c(FALSE,FALSE,TRUE),
+                  c=c(FALSE,TRUE,FALSE))
> df
      a     b     c
1  TRUE FALSE FALSE
2 FALSE FALSE  TRUE
3 FALSE  TRUE FALSE
> colnames(df)[apply(df, 1, which)]
[1] "a" "c" "b"
>

Dirk Eddelbuettel 2010-04-21 16:39:25

Wow. Yet again we came up with roughly the exact same solution at the same time independently. Even the data!

Shane 2010-04-21 16:40:37

You win by five minutes, but I get a higher technical score for using TRUE/FALSE instead of the very naughty and discouraged T/F :)

Dirk Eddelbuettel 2010-04-21 16:51:43

then who should get the green tick? (thanks both, btw)

Mike Dewar 2010-04-21 17:01:45

Clearly, I should get the green tick since I gave *two* solutions. :)

Shane 2010-04-21 17:05:55

hmm. I think there's something wrong with your second solution though! It doesn't deal with multiple TRUEs in one column, whereas the shared solution deals with this fine. Compare the outputs using `df <- data.frame(a=c(FALSE, TRUE, FALSE, TRUE), b=c(TRUE, FALSE, FALSE, FALSE), c=c(FALSE, FALSE, TRUE, FALSE))` - which would you expect to be the appropriate behaviour?

Mike Dewar 2010-04-21 17:12:43

Good catch. I just corrected that. Let us know which of those approaches *performs* better on your data set?

Shane 2010-04-21 17:22:29

The second solution, where you form an index set first, takes less than half the time of the simpler apply. Don't know why, though! I'd have hoped that the simpler expression went faster! This decides the tick, though!

Mike Dewar 2010-04-21 17:45:24

Great. `apply` is really nothing more an a loop (search stackoverflow for other discussions on this...); it could actually be slower than your `for` loop. You might also consider giving us each a vote to reward Dirk's diligence in using the full TRUE/FALSE names.

Shane 2010-04-21 17:53:37

both get a vote! Thanks as always for the awesome help!

Mike Dewar 2010-04-22 13:55:34

ansaurus

tags:

views:

answers:

R: manipulating data.frames containing strings and booleans.

related questions