views:

130

answers:

5

I have an R data frame:

> tab1
  pat  t conc
1  P1  0  788
2  P1  5  720
3  P1 10  655
4  P2  0  644
5  P2  5  589
6  P2 10  544

I am trying to create a new column for conc as a percentage of conc at t=0 for each patient. As well as many other things, I have tried:

tab1$conct0 <- tab1$conc / tab1$conc[tab1$t == 0  & tab1$pat == tab1$pat]

But I am clearly miles off with the correct code that means "conc WHERE t==0 AND pat == pat for this particular row"

I am sure I could use a for loop or something but hoped there was something easier?

Thanks

A: 

I would find the starting concentration for each patient with:

startConc <- tab1[tab1$t == 0,]

which gives (from your example data)

  pat t conc
1  P1 0  788
4  P2 0  644

After that you can use apply

newconc <- apply(tab1, 1, function(x){as.numeric(x[3])/startConc[startConc$pat==x[1],3]})

which gives you

[1] 1.0000000 0.9137056 0.8312183 1.0000000 0.9145963 0.8447205
nico
Brilliant, that is exactly what I needed! Thanks
Nick
@Nico I think this isn't correct. Take a subset of `tab1` (e.g.: `tab1<-subset(tab1, t<10)` and check results. Problem is with `tab1$pat == unique(tab1$pat)`, `==` operator replicates shorter vector so (for example data sets) you compare `1,1,1,2,2,2` with `1,2,1,2,1,2` so accidentally it works. But if you had other vector it will fail.
Marek
@Marek: Thanks for spotting that Marek, I corrected the code, now it should work.
nico
@nico This `%in%` part is always `TRUE`. I think `startConc<-tab1[tab1$t == 0,]` (or `subset(tab1, t==0)`) is sufficient.
Marek
A: 

A slightly makeshift way to do it, but works in this case:

xt <- xtabs(conc~t+pat,tab1)
tab1$conct0 <- as.numeric(t(t(xt)/xt[1,])) # need to use transpose because of the way matrix vector indexing works

The xt[1,] represents the row for t=0; you could also use xt["0",].

Edit

A more robust way:

tabt <- subset(tab1,t==0)
names(tabt)[3] <- "conct0"
tab1 <- merge(tab1,tabt[,c(1,3)])
tab1$conct0 <- tab1$conc/tab1$conct0
James
A: 

If you can safely assume that your concentration doesn't rise over time then the shortest and fastest calculating answer for this is...

tab1$concp <- ave(tab1$conc, tab1$pat, FUN = function(x) x/max(x))
John
This will only work if `max(tab1$conc)` occurs at t=0.
Joshua Ulrich
Which could be repaired with `tab1$conc/ave(ifelse(tab1$t==0,tab1$conc,-Inf), tab1$pat, FUN = function(x) max(x))`
Marek
+1  A: 

With plyr:

library(plyr)
ddply(tab1, "pat", transform, conct0 = conc / conc[t == 0])
hadley
A: 

I would use tapply. Given your data:

tab1 <- data.frame(
    pat = c(rep("P1", 3), rep("P2", 3)),
    t = c(0, 5, 10, 0, 5, 10),
    conc = c(788, 720, 655, 644, 589, 544))

this one-liner will do it for you in the way you are hinting at in your post:

> tab1$conc / tab1$conc[tab1$t == 0][tapply(tab1$pat, tab1$pat)]
[1] 1.0000000 0.9137056 0.8312183 1.0000000 0.9145963 0.8447205

The tapply without any function creates an row index matching patient id (number) for each row. I find this method rather fast and useful. But that assumes your patient ids' are ordered. If that is an issue, we can make sure they fit the patient id order:

> tab1$conc / tab1$conc[tab1$t == 0][order(unique(tab1$pat))][tapply(tab1$pat, tab1$pat)]
[1] 1.0000000 0.9137056 0.8312183 1.0000000 0.9145963 0.8447205

If you are using this often I would write a function for it, i.e. like this:

myFract <- function(obj, x = "conc", id = "pat", time = "t", start = NULL) {
    if (is.null(start)) start <- min(obj[, time])
    ii <- which(obj[, time] == start)
    ii <- ii[order(unique(obj[, id]))][tapply(obj[, id], obj[, id])]
    obj[, x] / obj[ii, x]
}

Such that:

> myFract(tab1)
[1] 1.0000000 0.9137056 0.8312183 1.0000000 0.9145963 0.8447205
eyjo