views:

74

answers:

3

I'd like to create a function that automatically generates uni and multivariate regression analyses, but I'm not able to figure out how I can specify *variables in vectors...*This seems very easy, but skimming the documentation I havent figured it out so far...

Easy example

a<-rnorm(100)
b<-rnorm(100)
k<-c("a","b")
d<-c(a,b)
summary(k[1])

But k[1]="a" and is a character vector...d is just b appended to a, not the variable names. In effect I'd like k[1] to represent the vector a.

Appreciate any answers...

//M

+3  A: 

you could use a list k=list(a,b). This creates a list with components a and b but is not a list of variable names.

Andrew Redd
+2  A: 

get() is what you're looking for :

summary(get(k[1]))

edit : get() is not what you're looking for, it's list(). get() could be useful too though.

If you're looking for automatic generation of regression analyses, you might actually benefit from using eval(), although every R-programmer will warn you about using eval() unless you know very well what you're doing. Please read the help files about eval() and parse() very carefully before you use them.

An example :

d <- data.frame(
  var1 = rnorm(1000),
  var2 = rpois(1000,4),
  var3 = sample(letters[1:3],1000,replace=T)
)

vars <- names(d)

auto.lm <- function(d,dep,indep){
      expr <- paste(
          "out <- lm(",
          dep,
          "~",
          paste(indep,collapse="*"),
          ",data=d)"
      )
      eval(parse(text=expr))
      return(out)
}

auto.lm(d,vars[1],vars[2:3])
Joris Meys
That'll do it.. Thx a million. Misha
Misha
You're welcome. But actually Halpo is right. If you want k[1] to represent the vector a, then you need a list. It's worth looking into as well.
Joris Meys
+3  A: 

You can use the "get" function to get an object based on a character string of its name, but in the long run it is better to store the variables in a list and just access them that way, things become much simpler, you can grab subsets, you can use lapply or sapply to run the same code on every element. When saving or deleting you can just work on the entire list rather than trying to remember every element. e.g.:

mylist <- list(a=rnorm(100), b=rnorm(100) )
names(mylist)
summary(mylist[[1]])
# or
summary(mylist[['a']])
# or
summary(mylist$a)
# or 
d <- 'a'
summary(mylist[[d]])

# or
lapply( mylist, summary )

If you are programatically creating models for analysis with lm (or other modeling functions), then one approach is to just subset your data and use the ".", e.g.:

yvar <- 'Sepal.Width'
xvars <- c('Petal.Width','Sepal.Length')
fit <- lm( Sepal.Width ~ ., data=iris[, c(yvar,xvars)] )

Or you can build the formula using "paste" or "sprintf" then use "as.formula" to convert it to a formula, e.g.:

yvar <- 'Sepal.Width'
xvars <- c('Petal.Width','Sepal.Length')
my.formula <- paste( yvar, '~', paste( xvars, collapse=' + ' ) )
my.formula <- as.formula(my.formula)
fit <- lm( my.formula, data=iris )

Note also the problem of multiple comparisons if you are looking at many different models fit automatically.

Greg Snow
Indeed, using as.formula() is a lot cleaner than the eval() parse() construct I used.
Joris Meys
This is getting even better...Thx
Misha
A nice way of pre-allocating a list is via vector("list", n) where n is the number of elements the list is suppose to hold. Sorry to be a bit off topic. :)
Roman Luštrik