tags:

views:

53

answers:

2

I have loaded a dataset, D, into R and I would like to perform a frequency of all the variables in D versus D$binary_outcome. How do I do that?

I would like to know if there is some code that is fairly generic and D may have any number of variables and the code should be able to handle a dataset with any number of variables.

In effect I want to be able to do something like

d = read.csv("c:/d.csv")
d.freq.varA = table(d$varA,d$binary_outcome)
d.freq.varB = table(d$varB,d$binary_outcome)
...
d.freq.varZZZ = table(d$varZZZ,d$binary_outcome)

for all variables A to ZZZ in d.

+2  A: 

I think this should get you somewhere. It might look better in a loop.

lapply(names(d)[grep('var', names(d))],
       function(name){
             assign(name, table(d[,name],d$binary_outcome), 
             envir = .GlobalEnv)
             }
      )
DiggyF
Seems to work really well. I don't need the grep bit! Thanks!lapply(names(d), function(name){ assign(name, table(d[,name],d$binary_outcome), envir = .GlobalEnv) } )
xiaodai
A: 

Does every variable have the same levels? If so, if youreshape::melt() the data first, you can create one multidimensional table.

d.m <- melt(d, id = "binary_outcome")
freq.all.vars <- with(d.m, table(binary_outcome, value, variable))

freq.var.a <- freq.all.vars[,,"varA"]
JoFrhwld