views:

50

answers:

2

I am trying to find the linear trend line for a set of data. The set contains pairs of dates (x values) and scores (y values). I am using a version of this code as the basis of my algorithm.

The results I am getting are off by a few orders of magnitude. I assume that there is some problem with round off error or overflow because I am using Date's getTime method which gives you a huge number of milliseconds. Does anyone have a suggestion on how to minimize the errors and compute the correct results?

A: 

The type of a unix timestamp is an integer and you are reading data as a double. Depending on the relative sizes you're almost bound to get into trouble.

Keep the timestamps as integers or convert the time into something more appropriate to your problem.

msw
Actually, getTime returns a long, the number of milliseconds since the Unix epoch. And that algorithm requires non-integral values (e.g. the average), so I don't think keeping them as `int` s or `long` s is an option.
Matthew Flaschen
+2  A: 

Maybe it helps to transform the long value that Date returns into something smaller.

If you do not need millisecond precision, you can just divide by 1000. Maybe you do not even need seconds, divide by another 60.

Also, the value is anchored at January, 1st, 1970. If you only need more recent dates, you could subtract the offset to re-base it in 2000.

The whole idea is to make differences in the data more significant numerically (percentage-wise).

Thilo