views:

399

answers:

3
A: 

I voted this up for the quality of your results and the clarity of your write-up. I wish I could offer an answer that could improve on your already excellent work.

I fear that it might be a matter of trial and error with the trend weight until you see an improved fit.

It could be that you could make this an input from users as well: allow them to fiddle with the value, given realistic constraints, until they get satisfactory values.

I also wondered if the weight would be different for each graph, since the number of points in each is different. Are you trying to get a single weighting that works for all graphs?

Excellent work; a nice question. Well done. I wish I was more helpful. Perhaps someone else will have more wisdom to impart than I do.

duffymo
Thanks, duffymo. It has been a tough problem to coax JasperSoft's iReport and the JFreeChart API into producing trend reports. The results shown are likely good enough for their purpose, but it seems like there should be a formula to calculate the appropriate weighting. I can use different weights for different graphs because the number of data points is known before the graphing begins.
Dave Jarvis
A: 

It might look like the trend lines are accurate in those 4 graphs but its really quite off. (This is best seen in the begging of the lower left one and the beginning of the upper right. I would think that you would want to use no less than half of your points when finding the trend line (though really you should use much more than half). I would suggest a Trend Weight of 2 at a maximum. Though really you ought to stick closer to the 1-1.5 range. Since it is arbitrary i would suggest you give your user an "accuracy of trend line" slider that they can use where the most accurate setting uses a trend weight of 1 and the least accurate uses a weight of #of data points +1. This would use 0 points (amusing you always round down) and, i would assume, though your statistics software might be different, will generate a strait horizontal line.

David
Hi, David. Thanks for the help. Due to the API, each data point must become some point on the trend line. Using a trend weight of 2 won't work. The reason the upper-right is off at the beginning is because there are few data points between January and March, which is not the case with production data. I thought about letting them pick a value for the trend line's weight (with a suggested value), but was hoping there was some formula I could apply.
Dave Jarvis
In the one in the upper right it doesn't look like its off for lack of data. Its going way too high without data to get it there. in that first month its peaking above the max for the next month as well as well above the mean for the next month. I would think that the curve should be bellow the blue line in the first month since theres no data in that month to pull it up above the blue line but there is data in the second month to keep it down.
David
+1  A: 

Based on the looks of the graphs I would say you have too many points for your 12 point graph (it is just a spline of the points given... which is visually pleasing, but actually does more harm than good when trying to understand the trend) and too few points for your 365 point graph. Perhaps try doing something a little exponential like:

(Data points)^1.2/14.1

I do realize this is even more arbitrary than what you already have, but arbitrary isn't the worst thing in the world.

(I got 14.1 by trying to keep the 52 point graph fixed, since that one looks nice, by taking (52^(1.2)/52)*6.4=14.1. You using this technique you could try other powers besides 1.2 to see what you visually get.

Dan

Dan
Thanks, Dan. The spline must consist of the same number of data points as the underlying data set (an unfortunate API limitation). The users will have the option to disable the trend graph because (as you pointed out) with too few data points the trend line is meaningless.
Dave Jarvis
I may have phrased my suggestion wrong. I'm proposing alternative window size calculations.
Dan
For too few data points the spline is automatically removed. I prefer this answer because it offers another way of looking at the math behind the calculation.
Dave Jarvis