views:

384

answers:

3

Hi,

I have a dataset in R, which contains the results of a rapid diagnostic test. The test has a visible line if it is working properly (control line) and a visible line for each of the two parasite species it detects, if they are present in the patient sample.

The dataset contains a logical column for each test line, as follows: (database is called RDTbase)

   Control  Pf    Pv
1. TRUE     TRUE  FALSE
2. TRUE     FALSE TRUE
3. FALSE    FALSE FALSE
4. TRUE     TRUE  TRUE
5. TRUE     FALSE FALSE

I would like to add a new column which contains a single result for each rapid test. The results are designated according to the different logical conditions met by the three lines. For the example above the new column would look like this:

Control  Pf     Pv     Result
1. TRUE  TRUE   FALSE  Pf
2. TRUE  FALSE  TRUE   Pv
3. FALSE FALSE  FALSE  Invalid
4. TRUE  TRUE   TRUE   Mixed
5. TRUE  FALSE  FALSE  Negative

I am able to create the new column, but it takes a lot of coding and I think there has to be a much simpler (and shorter) way to do this.

Here is my current (long) method:

R.Pf <- RDTbase[which(Control == "TRUE" & Pf == "TRUE" & Pv == "FALSE"),]
R.Pv <- RDTbase[which(Control == "TRUE" & Pf == "FALSE" & Pv == "TRUE"),]
R.inv <- RDTbase[which(Control == "FALSE"),]
R.mix <- RDTbase[which(Control == "TRUE" & Pf == "TRUE" & Pv == "TRUE"),]
R.neg <- RDTbase[which(Control == "TRUE" & Pf == "FALSE" & Pv == "FALSE"),]

R.Pf$Result <- c("Pf")
R.Pv$Result <- c("Pv")
R.inv$Result <- c("Invalid")
R.mix$Result <- c("Mixed")
R.neg$Result <- c("Negative")

RDTbase2 <- rbind(R.Pf, R.Pv, R.inv, R.mix, R.neg)

Any ideas on how to simplify and shorten this code would be greatly appreciated, as I have to do this kind of thing to my databases alot.

Many thanks, Amy

A: 

I would simply create another column of the data frame and assign to different subsets of it conditionally. You can also slim down the data frame indexing code.

RDTbase$Result = NA 
RDTbase <- within(RDTbase, Result[Control=="TRUE" & Pf=="TRUE" & Pv=="FALSE"] <- "Pf")
RDTbase <- within(RDTbase, Result[Control=="FALSE"] <- "Invalid")

etc.

"within" just saves a little typing.

Thanks xbalto - will try this too. So "within" is another way of subsetting?
Amy Mikhail
Ah - I get it now, "within" allows you to refer to a subset of the dataframe without having to take it out. After combining both your suggestions, my code is now 5 lines shorter - thanks again!
Amy Mikhail
+1  A: 

First of all it would be nice when you use logical vector instead character, then you could write Control instead Control == "TRUE" and !Control instead Control == "FALSE". And your code will be shorter.

For you problem I will use several ifelse:

RDTbase$Result <- ifelse(
  Control == "TRUE",
  ifelse(
    Pf == "TRUE",
    ifelse(Pv == "TRUE","Mixed","Pf"), # when Control is TRUE, Pf is TRUE
    ifelse(Pv == "TRUE","Pv","Negative"), # when Control is TRUE, Pf is FALSE
  ),
  "Invalid" # when Control is FALSE
)

But I like magic tricks so you could do follow:

num_code <- (
  as.numeric(as.logical(Control))
  + 2*as.numeric(as.logical(Pf))
  + 4*as.numeric(as.logical(Pv))
) # values are 0,1,2,...,7
# then 
RDTbase$Result <- c( 
  "Invalid" , # 0 = F,F,F # Control, Pf, Pv
  "Negative", # 1 = T,F,F
  "Invalid" , # 2 = F,T,F
  "Pf"      , # 3 = T,T,F
  "Invalid" , # 4 = F,F,T
  "Pv"      , # 5 = T,F,T
  "Invalid" , # 6 = F,T,T
  "Mixed"   , # 7 = T,T,T
)[num_code+1]

It's nice trick when you need to decode several logical column to character.

Marek
Thanks Marek!Both very useful tricks, I didn't know one could refer to logical vectors like that, and converting the logicals to numbers is neat. That will certainly help to make it more concise...
Amy Mikhail
A: 

Using transform makes this compact and elegant:

transform(a, Result = 
 ifelse(Control,
  ifelse(Pf, 
   ifelse(Pv, "Mixed", "Pf"),
   ifelse(Pv, "Pv", "Negative")),
  "Invalid"))

Yields

  Control    Pf    Pv   Result
1    TRUE  TRUE FALSE       Pf
2    TRUE FALSE  TRUE       Pv
3   FALSE FALSE FALSE  Invalid
4    TRUE  TRUE  TRUE    Mixed
5    TRUE FALSE FALSE Negative

Alternatively, building on Marek's version we can use logical vectors to calculate the index slightly more compactly:

a$Result = apply(a,1,
  function(x){
    c(rep("Invalid", 4), "Negative", "Pv", "Pf", "Mixed")
      [1+sum(c(4,2,1)[x])]})
Alex Brown