views:

77

answers:

5

I have a data set that defines a set of points on a 2-dimensional Cartesian plane. Theoretically, those points should form a line, but that line may be perfectly horizontal, perfectly vertical, and anything in between.

I would like to design an algorithm that rates the 'straightness' of that line.

For example, the following data sets would be perfectly straight:

 Y = 2/3x + 4
 X  |  Y
---------
-3  |  2
 0  |  4
 3  |  6

 Y = 4
 X  |  Y
---------
 1  |  4
 2  |  4
 3  |  4

 X = -1
 X  |  Y
---------
-1  |  7
-1  |  8
-1  |  9

While this one would not:

 X  |  Y
---------
-3  |  2
 0  |  5
 3  |  6

I think it would work to minimize the sum of the squares of the distances of each point from to a line (usually called a regression line), then determine the average distance of each point to the line. Thus, a perfectly straight line would have an average distance of 0.

Because the data can represent a line that is vertical, as I understand it, the usual least-squares regression line won't work for this data set. A perpendicular least-squares regression line might work, but I've had little luck finding an implementation of one.

I am working in Excel 2010 VBA, but I should be able to translate any reasonable algorithm.

Thanks, PaulH


The reason things like RSQ and LinEst won't work for this is because I need a universal measurement that includes vertical lines. As a line's slope approaches infinity (vertical), their RSQ approaches 0 even if the line is perfectly straight or nearly so.

-PaulH

+1  A: 

Could you try to catch the case of the vertical line before moving the least squares regression? If all x-values are the same, then the line is perfectly straight, no need to calculate an r^2 value.

Josiah
A: 

Yes, use ordinary least squares method. Just use the Slope and Intercept functions in a worksheet. I expect there is a simple way to call these from the VBA codebehind.

Here's the VBA info. for R-Squared: http://www.pcreview.co.uk/forums/thread-1009945.php

Carnotaurus
+2  A: 

Use a Linear Regression. The "straightness" of the line is the R^2 value.

A value of 0 for the R^2 value implies it is perfectly straight. Increasing values imply increasing error in the regression, and thus the line is less and less "straight"

Stargazer712
I tried RSQ earlier and thought (mistakenly) that it wasn't working. Other than needing to check to see if it divides by 0, it works fine.
PaulH
Okay. I now know why I thought RSQ wasn't working. An RSQ of 1.0 indicates the line is perfectly straight. But, as a line approaches a slope of infinity (vertical) its RSQ drops to 0.0. So, RSQ won't work for vertical lines. (Probably because it uses vertical least-squares regression).
PaulH
A: 

Rough idea: 1. translate all coordinates to absolute values 2. calculate tan of current x/y 3. calculate tan of difference in x/y between current x/y and next x/y 4. difference in tan can give running deviation

Nathan Kidd
+3  A: 

Sounds like you are looking for R2, the coefficient of determinism.

Basically, you take the residual sum of squares, divide by the sum of squares and subtract from 1.

BlueRaja - Danny Pflughoeft