ansaurus

Question

Create a new column in data.frame using conditions of each row

Answer 1

A:

I would find the starting concentration for each patient with:

startConc <- tab1[tab1$t == 0,]

which gives (from your example data)

  pat t conc
1  P1 0  788
4  P2 0  644

After that you can use apply

newconc <- apply(tab1, 1, function(x){as.numeric(x[3])/startConc[startConc$pat==x[1],3]})

which gives you

[1] 1.0000000 0.9137056 0.8312183 1.0000000 0.9145963 0.8447205

nico 2010-08-23 12:02:41

Brilliant, that is exactly what I needed! Thanks

Nick 2010-08-23 13:09:53

@Nico I think this isn't correct. Take a subset of `tab1` (e.g.: `tab1<-subset(tab1, t<10)` and check results. Problem is with `tab1$pat == unique(tab1$pat)`, `==` operator replicates shorter vector so (for example data sets) you compare `1,1,1,2,2,2` with `1,2,1,2,1,2` so accidentally it works. But if you had other vector it will fail.

Marek 2010-08-23 17:01:57

@Marek: Thanks for spotting that Marek, I corrected the code, now it should work.

nico 2010-08-23 18:09:06

@nico This `%in%` part is always `TRUE`. I think `startConc<-tab1[tab1$t == 0,]` (or `subset(tab1, t==0)`) is sufficient.

Marek 2010-08-25 04:22:57

Answer 2

A:

A slightly makeshift way to do it, but works in this case:

xt <- xtabs(conc~t+pat,tab1)
tab1$conct0 <- as.numeric(t(t(xt)/xt[1,])) # need to use transpose because of the way matrix vector indexing works

The xt[1,] represents the row for t=0; you could also use xt["0",].

Edit

A more robust way:

tabt <- subset(tab1,t==0)
names(tabt)[3] <- "conct0"
tab1 <- merge(tab1,tabt[,c(1,3)])
tab1$conct0 <- tab1$conc/tab1$conct0

James 2010-08-23 12:08:17

Answer 3

A:

If you can safely assume that your concentration doesn't rise over time then the shortest and fastest calculating answer for this is...

tab1$concp <- ave(tab1$conc, tab1$pat, FUN = function(x) x/max(x))

John 2010-08-23 12:47:07

This will only work if `max(tab1$conc)` occurs at t=0.

Joshua Ulrich 2010-08-23 13:15:02

Which could be repaired with `tab1$conc/ave(ifelse(tab1$t==0,tab1$conc,-Inf), tab1$pat, FUN = function(x) max(x))`

Marek 2010-08-23 16:58:14

Answer 4

+1 A:

With plyr:

library(plyr)
ddply(tab1, "pat", transform, conct0 = conc / conc[t == 0])

hadley 2010-08-24 23:17:16

Answer 5

A:

I would use tapply. Given your data:

tab1 <- data.frame(
    pat = c(rep("P1", 3), rep("P2", 3)),
    t = c(0, 5, 10, 0, 5, 10),
    conc = c(788, 720, 655, 644, 589, 544))

this one-liner will do it for you in the way you are hinting at in your post:

> tab1$conc / tab1$conc[tab1$t == 0][tapply(tab1$pat, tab1$pat)]
[1] 1.0000000 0.9137056 0.8312183 1.0000000 0.9145963 0.8447205

The tapply without any function creates an row index matching patient id (number) for each row. I find this method rather fast and useful. But that assumes your patient ids' are ordered. If that is an issue, we can make sure they fit the patient id order:

> tab1$conc / tab1$conc[tab1$t == 0][order(unique(tab1$pat))][tapply(tab1$pat, tab1$pat)]
[1] 1.0000000 0.9137056 0.8312183 1.0000000 0.9145963 0.8447205

If you are using this often I would write a function for it, i.e. like this:

myFract <- function(obj, x = "conc", id = "pat", time = "t", start = NULL) {
    if (is.null(start)) start <- min(obj[, time])
    ii <- which(obj[, time] == start)
    ii <- ii[order(unique(obj[, id]))][tapply(obj[, id], obj[, id])]
    obj[, x] / obj[ii, x]
}

Such that:

> myFract(tab1)
[1] 1.0000000 0.9137056 0.8312183 1.0000000 0.9145963 0.8447205

eyjo 2010-08-30 20:14:15

ansaurus

tags:

views:

answers:

Create a new column in data.frame using conditions of each row

related questions