I have a data set that defines a set of points on a 2-dimensional Cartesian plane. Theoretically, those points should form a line, but that line may be perfectly horizontal, perfectly vertical, and anything in between.
I would like to design an algorithm that rates the 'straightness' of that line.
For example, the following data sets would be perfectly straight:
Y = 2/3x + 4
X | Y
---------
-3 | 2
0 | 4
3 | 6
Y = 4
X | Y
---------
1 | 4
2 | 4
3 | 4
X = -1
X | Y
---------
-1 | 7
-1 | 8
-1 | 9
While this one would not:
X | Y
---------
-3 | 2
0 | 5
3 | 6
I think it would work to minimize the sum of the squares of the distances of each point from to a line (usually called a regression line), then determine the average distance of each point to the line. Thus, a perfectly straight line would have an average distance of 0.
Because the data can represent a line that is vertical, as I understand it, the usual least-squares regression line won't work for this data set. A perpendicular least-squares regression line might work, but I've had little luck finding an implementation of one.
I am working in Excel 2010 VBA, but I should be able to translate any reasonable algorithm.
Thanks, PaulH
The reason things like RSQ and LinEst won't work for this is because I need a universal measurement that includes vertical lines. As a line's slope approaches infinity (vertical), their RSQ approaches 0 even if the line is perfectly straight or nearly so.
-PaulH