I am trying to recreate the following plot with R. Minitab describes this as a normal probability plot.
The probplot gets you most of the way there. Unfortunately, I cannot figure out how to add the confidence interval bands around this plot.
Similarly, ggplot's stat_qq() seems to present similar information with a transformed x axis. It seems that geom_smooth()
would be the likely candidate to add the bands, but I haven't figure that out.
Finally, the Getting Genetics Done guys describe something similar here.
Sample data to recreate the plot above:
x <- c(40.2, 43.1, 45.5, 44.5, 39.5, 38.5, 40.2, 41.0, 41.6, 43.1, 44.9, 42.8)
If anyone has a solution with base graphics or ggplot, I'd appreciate it!
EDIT
After looking at the details of probplot
, I've determined this is how it generates the fit line on the graph:
> xl <- quantile(x, c(0.25, 0.75))
> yl <- qnorm(c(0.25, 0.75))
> slope <- diff(yl)/diff(xl)
> int <- yl[1] - slope * xl[1]
> slope
75%
0.4151
> int
75%
-17.36
Indeed, comparing these results to what you get out of the probplot object seem to compare very well:
> check <- probplot(x)
> str(check)
List of 3
$ qdist:function (p)
$ int : Named num -17.4
..- attr(*, "names")= chr "75%"
$ slope: Named num 0.415
..- attr(*, "names")= chr "75%"
- attr(*, "class")= chr "probplot"
>
However, incorporating this information into ggplot2 or base graphics does not yield the same results.
probplot(x)
Versus:
ggplot(data = df, aes(x = x, y = y)) + geom_point() + geom_abline(intercept = int, slope = slope)
I get similar results using R's base graphics
plot(df$x, df$y)
abline(int, slope, col = "red")
Lastly, I've learned that the last two rows of the legend refer to the Anderson-Darling test for normality and can be reproduced with the nortest
package.
> ad.test(x)
Anderson-Darling normality test
data: x
A = 0.2303, p-value = 0.7502