tags:

views:

652

answers:

4

Hello,

I need to calculate the within and between run variances from some data as part of developing a new analytical chemistry method. I also need confidence intervals from this data using the R language

I assume I need to use anova or something ?

My data is like

> variance
   Run Rep Value
1    1   1  9.85
2    1   2  9.95
3    1   3 10.00
4    2   1  9.90
5    2   2  8.80
6    2   3  9.50
7    3   1 11.20
8    3   2 11.10
9    3   3  9.80
10   4   1  9.70
11   4   2 10.10
12   4   3 10.00
+1  A: 

If you want to apply a function (such as var) across a factor such as Run or Rep, you can use tapply:

> with(variance, tapply(Value, Run, var))
          1           2           3           4 
0.005833333 0.310000000 0.610000000 0.043333333 
> with(variance, tapply(Value, Rep, var))
          1          2          3 
0.48562500 0.88729167 0.05583333
Jonathan Chang
Nice! That is Elegant Code in my opinion.
kpierce8
+4  A: 

You have four groups of three observations:

> run1 = c(9.85, 9.95, 10.00)
> run2 = c(9.90, 8.80, 9.50)
> run3 = c(11.20, 11.10, 9.80)
> run4 = c(9.70, 10.10, 10.00)
> runs = c(run1, run2, run3, run4)
> runs
 [1]  9.85  9.95 10.00  9.90  8.80  9.50 11.20 11.10  9.80  9.70 10.10 10.00

Make some labels:

> n = rep(3, 4)
> group = rep(1:4, n)
> group
 [1] 1 1 1 2 2 2 3 3 3 4 4 4

Calculate within-run stats:

> withinRunStats = function(x) c(sum = sum(x), mean = mean(x), var = var(x), n = length(x))
> tapply(runs, group, withinRunStats)
$`1`
         sum         mean          var            n 
29.800000000  9.933333333  0.005833333  3.000000000 

$`2`
  sum  mean   var     n 
28.20  9.40  0.31  3.00 

$`3`
  sum  mean   var     n 
32.10 10.70  0.61  3.00 

$`4`
        sum        mean         var           n 
29.80000000  9.93333333  0.04333333  3.00000000

You can do some ANOVA here:

> data = data.frame(y = runs, group = factor(group))
> data
       y group
1   9.85     1
2   9.95     1
3  10.00     1
4   9.90     2
5   8.80     2
6   9.50     2
7  11.20     3
8  11.10     3
9   9.80     3
10  9.70     4
11 10.10     4
12 10.00     4

> fit = lm(runs ~ group, data)
> fit

Call:
lm(formula = runs ~ group, data = data)

Coefficients:
(Intercept)       group2       group3       group4  
  9.933e+00   -5.333e-01    7.667e-01   -2.448e-15 

> anova(fit)
Analysis of Variance Table

Response: runs
          Df  Sum Sq Mean Sq F value  Pr(>F)  
group      3 2.57583 0.85861  3.5437 0.06769 .
Residuals  8 1.93833 0.24229                  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

> degreesOfFreedom = anova(fit)[, "Df"]
> names(degreesOfFreedom) = c("treatment", "error")
> degreesOfFreedom
treatment     error 
        3         8

Error or within-group variance:

> anova(fit)["Residuals", "Mean Sq"]
[1] 0.2422917

Treatment or between-group variance:

> anova(fit)["group", "Mean Sq"]
[1] 0.8586111

This should give you enough to do confidence intervals.

Alex Reynolds
+3  A: 

I'm going to take a crack at this when I have more time, but meanwhile, here's the dput() for Kiar's data structure:

structure(list(Run = c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4), Rep = c(1, 
2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3), Value = c(9.85, 9.95, 10, 9.9, 
8.8, 9.5, 11.2, 11.1, 9.8, 9.7, 10.1, 10)), .Names = c("Run", 
"Rep", "Value"), row.names = c(NA, -12L), class = "data.frame")

... in case you'd like to take a quick shot at it.

Matt Parker
+2  A: 

I've been looking at a similar problem. I've found reference to caluclating confidence intervals by Burdick and Graybill (Burdick, R. and Graybill, F. 1992, Confidence Intervals on variance components, CRC Press)

Using some code I've been trying I get these values



> kiaraov = aov(Value~Run+Error(Run),data=kiar)

> summary(kiaraov)

Error: Run
    Df  Sum Sq Mean Sq
Run  3 2.57583 0.85861

Error: Within
          Df  Sum Sq Mean Sq F value Pr(>F)
Residuals  8 1.93833 0.24229               
> confint = 95

> a = (1-(confint/100))/2

> grandmean = as.vector(kiaraov$"(Intercept)"[[1]][1]) # Grand Mean (I think)

> within = summary(kiaraov)$"Error: Within"[[1]]$"Mean Sq"  # S2^2Mean Square Value for Within Run

> dfRun = summary(kiaraov)$"Error: Run"[[1]]$"Df"

> dfWithin = summary(kiaraov)$"Error: Within"[[1]]$"Df"

> Run = summary(kiaraov)$"Error: Run"[[1]]$"Mean Sq" # S1^2Mean Square for between Run

> between = (Run-within)/((dfWithin/(dfRun+1))+1) # (S1^2-S2^2)/J

> total = between+within

> between # Between Run Variance
[1] 0.2054398

> within # Within Run Variance
[1] 0.2422917

> total # Total Variance
[1] 0.4477315

> betweenCV = sqrt(between)/grandmean * 100 # Between Run CV%

> withinCV = sqrt(within)/grandmean * 100 # Within Run CV%

> totalCV = sqrt(total)/grandmean * 100 # Total CV%

> #within confidence intervals

> withinLCB = within/qf(1-a,8,Inf) # Within LCB

> withinUCB = within/qf(a,8,Inf) # Within UCB

> #Between Confidence Intervals

> n1 = dfRun

> n2 = dfWithin

> G1 = 1-(1/qf(1-a,n1,Inf)) # According to Burdick and Graybill this should be a

> G2 = 1-(1/qf(1-a,n2,Inf))

> H1 = (1/qf(a,n1,Inf))-1  # and this should be 1-a, but my results don't agree

> H2 = (1/qf(a,n2,Inf))-1

> G12 = ((qf(1-a,n1,n2)-1)^2-(G1^2*qf(1-a,n1,n2)^2)-(H2^2))/qf(1-a,n1,n2) # again, should be a, not 1-a

> H12 = ((1-qf(a,n1,n2))^2-H1^2*qf(a,n1,n2)^2-G2^2)/qf(a,n1,n2) # again, should be 1-a, not a

> Vu = H1^2*Run^2+G2^2*within^2+H12*Run*within

> Vl = G1^2*Run^2+H2^2*within^2+G12*within*Run

> betweenLCB = (Run-within-sqrt(Vl))/J # Betwen LCB

> betweenUCB = (Run-within+sqrt(Vu))/J # Between UCB

> #Total Confidence Intervals

> y = (Run+(J-1)*within)/J

> totalLCB = y-(sqrt(G1^2*Run^2+G2^2*(J-1)^2*within^2)/J) # Total LCB

> totalUCB = y+(sqrt(H1^2*Run^2+H2^2*(J-1)^2*within^2)/J) # Total UCB

> result = data.frame(Name=c("within", "between", "total"),CV=c(withinCV,betweenCV,totalCV),LCB=c(sqrt(withinLCB)/grandmean*100,sqrt(betweenLCB)/grandmean*100,sqrt(totalLCB)/grandmean*100),UCB=c(sqrt(withinUCB)/grandmean*100,sqrt(betweenUCB)/grandmean*100,sqrt(totalUCB)/grandmean*100))

> result
     Name       CV      LCB      UCB
1  within 4.926418 3.327584  9.43789
2 between 4.536327      NaN 19.73568
3   total 6.696855 4.846030 20.42647

Here the lower confidence interval for between run CV is less than zero, so reported as NaN.

I'd love to have a better way to do this. If I get time I might try to create a function to do this.

Paul.

--

PaulHurleyuk
That seems just what I wanted. I'll try to find that literarue reference.
Kiar