ansaurus

Question

Answer 1

+3 A:

If you code cond as a factor, you can get R to do the expansion you want via model.matrix. The only complication is that to get the coding you chose (dummy variables coding, or sum contrasts in R) we need to change the default constrasts used by R's model formula code.

## data
dat <- data.frame(ID = LETTERS[1:4], cond = factor(c("x","x","y","z")),
                  task1 = c(12,13,11,10), taskN = c(14,17,10,13))
dat

## We get R to produce the dummy variables for us,
## but your coding needs the contr.sum contrasts
op <- options(contrasts = c("contr.sum","contr.poly"))
dat2 <- data.frame(ID = dat$ID, model.matrix(ID ~ . - 1, data = dat))
## Levels of cond
lev <- with(dat, levels(cond))
## fix-up the names
names(dat2)[2:(1+length(lev))] <- lev
dat2

## reset contrasts
options(op)

This gives us:

> dat2
  ID x y z task1 taskN
1  A 1 0 0    12    14
2  B 1 0 0    13    17
3  C 0 1 0    11    10
4  D 0 0 1    10    13

This should scale automatically as the number of levels in cond changes/increases.

HTH

Gavin Simpson 2010-09-24 12:47:01

Answer 2

+2 A:

Another alternative is to use use cast in the reshape package:

library(reshape)
l <- length(levels(dat$cond))
dat2 <- merge(cast(dat,ID~cond),dat)[,c(1:(l+1),(l+3):(ncol(dat)+l))]
dat2[,2:(1+l)] <- !is.na(dat2[,2:(1+l)])

This gives you logical values rather than 0 and 1 though:

> dat2
  ID     x     y     z task1 taskN
1  A  TRUE FALSE FALSE    12    14
2  B  TRUE FALSE FALSE    13    17
3  C FALSE  TRUE FALSE    11    10
4  D FALSE FALSE  TRUE    10    13

James 2010-09-24 13:40:17

If you make your last line `dat2[,2:(1+l)] <- as.numeric(!is.na(dat2[,2:(1+l)]))` then you'll get the result the OP wanted.

Gavin Simpson 2010-09-24 13:48:53

Answer 3

+1 A:

That's cool using model.matrix for this. (reshape too.) Always learning something here. A couple more ideas:

indicator1 <- function(groupStrings) {
  groupFactors <- factor(groupStrings)
  colNames <- levels(groupFactors)
  bits <- matrix(0, nrow=length(groupStrings), ncol=length(colNames))
  bits[matrix(c(1:length(groupStrings),
                unclass(groupFactors)), ncol=2)] <- 1
  setNames(as.data.frame(bits), colNames)
}

indicator2 <- function(groupStrings) {
  colNames <- unique(groupStrings)
  bits <- outer(groupStrings, colNames, "==")
  setNames(as.data.frame(bits * 1), colNames)
}

Used as follows

d <- data.frame(cond=c("a", "a", "b"))
d <- cbind(d, indicator2(as.character(d$cond)))

David F 2010-09-24 14:57:52

Again, a great example of the greatness of open-source! Thanks so much for your help. The initial solution seemed to work best for me. In case someone else might be interested, here is how I implemented this with my (very large) dataset:

Jon Erik Ween 2010-09-27 19:17:06

Answer 4

A:

Again, a great example of the greatness of open-source! Thanks so much for your help. The initial solution seemed to work best for me. In case someone else might be interested, here is how I implemented this with my (very large) dataset:

 # Load needed libraries if not already so  
if("packages:sciplot" %in% search()) next else library(moments)  

 # Initialize dataframes. DEFINE THE workspace SUBSET TO ANALYZE HERE  
 df<-stroke  

 # Make any necessary modifications to the df  
 df$TrDif <- df$TrBt-df$TrAt  

 # 0) Set up indicator variables (iv) from the factor you choose.  
 op <- options(contrasts = c("contr.sum","contr.poly"))  
 dat<-subset(df,select=c("newcat"))  
 iv<-data.frame(model.matrix(~.-1,data=dat))  
 names(iv) <- levels(dat$newcat)  
 lbl<-levels(dat$newcat) # need this for plot functions below  

 # Select task variables with n > 1150 to be regressed (THIS CAN PROBABLY BE DONE MORE ELEGANTLY).  
 taskarr<-subset(df,   select=c("B20","B40","FW","Anim","TrAt","TrBt","TrBerr","TrDif","Snod15","tt","GEMS","Clock3","orient","Wlenc","wlfr","wlcr","wlrec","Snod15Rec","GEMSfr"))  

 ## 1) evaluate covariance matrix and extract sub-matrices  
 ## Caution: Covariance samples differ due to missing values.  
 sig <- cov(cbind(iv,taskarr),use="pairwise.complete.obs")

Jon Erik Ween 2010-09-27 19:21:11

ansaurus

tags:

views:

answers:

r row-wide conditional replacement

related questions