tags:

views:

85

answers:

2

Yesterday I worked up an example of the difference between Ordinary Least Squares (OLS) vs. Principal Components Analysis (PCA). For that illustration I wanted to show the errors minimized by OLS and PCA so I plotted the actuals, the predicted line and then I manually (with GIMP) drew in a drop line to illustrate a couple of the error terms. How can I code the creation of the error lines in R? Here's the code I used for my example:

set.seed(2)
x <- 1:100

y <- 20 + 3 * x
e <- rnorm(100, 0, 60)
y <- 20 + 3 * x + e

plot(x,y)
yx.lm <- lm(y ~ x)
lines(x, predict(yx.lm), col="red")

Then I manually added the yellow lines to produce the following:

alt text

+3  A: 

?segments

I'd provide an example, but I'm pretty busy today and it's not that complicated to pick the points. ;-)

Okay, so I'm not that busy...

n=58; segments(x[n],y[n],x[n],predict(yx.lm)[n])
n=65; segments(x[n],y[n],x[n],predict(yx.lm)[n])
Joshua Ulrich
I had not ever used the `segments` command! Thanks for pointing me to it.
JD Long
You're welcome; and thank you for the informative blog post.
Joshua Ulrich
+2  A: 

As Joshua mentioned, segments() is the way to go here. And as it is totally vectorised, we can add in all the errors at once, following on from your example

set.seed(2)
x <- 1:100

y <- 20 + 3 * x
e <- rnorm(100, 0, 60)
y <- 20 + 3 * x + e

plot(x,y)
yx.lm <- lm(y ~ x)
lines(x, predict(yx.lm), col="red")
## Add segments
segments(x, y, x, fitted(yx.lm), col = "blue")

If you only want to highlight a couple of the errors, then to modify the example Joshua gave:

n <- c(58,65)
segments(x[n], y[n], x[n], fitted(yx.lm)[n], col = "orange", lwd = 3)

HTH

G

Gavin Simpson
-1 (for me) for not vectorizing. Thanks for the clarification!
Joshua Ulrich