ansaurus

Question

Answer 1

+2 A:

## make fake data
> ngroups <- 2
> group <- 1:ngroups
> nobs <- 100
> dta <- data.frame(group=rep(group,each=nobs),y=rnorm(nobs*ngroups),x=runif(nobs*ngroups))
> head(dta)
  group          y         x
1     1  0.6482007 0.5429575
2     1 -0.4637118 0.7052843
3     1 -0.5129840 0.7312955
4     1 -0.6612649 0.9028034
5     1 -0.5197448 0.1661308
6     1  0.4240346 0.8944253
> 
> ## function to extract the results of one model
> foo <- function(z) {
+   ## coef and se in a data frame
+   mr <- data.frame(coef(summary(lm(y~x,data=z))))
+   ## put row names (predictors/indep variables)
+   mr$predictor <- rownames(mr)
+   mr
+ }
> ## see that it works
> foo(subset(dta,group==1))
              Estimate Std..Error   t.value  Pr...t..   predictor
(Intercept)  0.2176477  0.1919140  1.134090 0.2595235 (Intercept)
x           -0.3669890  0.3321875 -1.104765 0.2719666           x
> ## one option: use command by
> res <- by(dta,dta$group,foo)
> res
dta$group: 1
              Estimate Std..Error   t.value  Pr...t..   predictor
(Intercept)  0.2176477  0.1919140  1.134090 0.2595235 (Intercept)
x           -0.3669890  0.3321875 -1.104765 0.2719666           x
------------------------------------------------------------ 
dta$group: 2
               Estimate Std..Error    t.value  Pr...t..   predictor
(Intercept) -0.04039422  0.1682335 -0.2401081 0.8107480 (Intercept)
x            0.06286456  0.3020321  0.2081387 0.8355526           x
> ## using package plyr is better
> library(plyr)
> res <- ddply(dta,"group",foo)
> res
  group    Estimate Std..Error    t.value  Pr...t..   predictor
1     1  0.21764767  0.1919140  1.1340897 0.2595235 (Intercept)
2     1 -0.36698898  0.3321875 -1.1047647 0.2719666           x
3     2 -0.04039422  0.1682335 -0.2401081 0.8107480 (Intercept)
4     2  0.06286456  0.3020321  0.2081387 0.8355526           x
>

Eduardo Leoni 2009-07-23 04:22:58

Answer 2

+4 A:

Here's one way using the lme4 package.

> library(lme4)
> d <- data.frame(state=rep(c('NY', 'CA'), c(10, 10)),
+                 year=rep(1:10, 2),
+                 response=c(rnorm(10), rnorm(10)))

> xyplot(response ~ year, groups=state, data=d, type='l')

> fits <- lmList(response ~ year | state, data=d)
> fits
Call: lmList(formula = response ~ year | state, data = d)
Coefficients:
   (Intercept)        year
CA -1.34420990  0.17139963
NY  0.00196176 -0.01852429

Degrees of freedom: 20 total; 16 residual
Residual standard error: 0.8201316

ars 2009-07-23 04:55:41

Answer 3

+3 A:

In my opinion is a mixed linear model a better approach for this kind of data. The code below given in the fixed effect the overall trend. The random effects indicate how the trend for each individual state differ from the global trend. The correlation structure takes the temporal autocorrelation into account. Have a look at Pinheiro & Bates (Mixed Effects Models in S and S-Plus).

library(nlme)
lme(response ~ year, random = ~year|state, correlation = corAR1(~year))

Thierry 2009-07-24 11:32:35

This is a really good general stats theory answer which makes me think about some things I had not considered. The application which caused me to ask the question would not be applicable to this solution, but I'm glad you brought it up. Thank you.

JD Long 2009-07-24 14:11:32

It's not a good idea to start with a mixed model - how do you know that any of the assumptions are warranted?

hadley 2009-07-31 19:26:41

One should check the assumption by model validation (and knowledge of the data). BTW you cannot warrant the assumption on the individual lm's either. You would have to validate all models seperately.

Thierry 2009-07-31 22:59:39

Answer 4

+9 A:

Here's an approach using the plyr package:

d <- data.frame(
  state = rep(c('NY', 'CA'), 10),
  year = rep(1:10, 2),
  response= rnorm(20)
)

library(plyr)
# Break up d by state, then fit the specified model to each piece and
# return a list
models <- dlply(d, "state", function(df) 
  lm(response ~ year, data = df))

# Apply coef to each model and return a data frame
ldply(models, coef)

# Print the summary of each model
l_ply(models, summary, .print = TRUE)

hadley 2009-07-31 19:30:45

ansaurus

tags:

views:

answers:

Linear Regression and group by in R

related questions