views:

55

answers:

5

I have a set of data, for which I'd like to find an average peak. I've done some testing in Numbers.app to see what I'm after and if I make a chart of the dataset it has a feature it calls "polynomial trendline" which draws a curve of the data and the peak of that curve looks exactly like the point/value I'm after.

So how could I programmatically calculate that curve and find that tangent on the curve?

I've been looking around on wikipedia and found topics like "Normal distribution" and "Polynomial regression" which seems very much related, but I've always found it hard to follow the equations on wikipedia so I'm hoping maybe someone here could give me a programatic example.

Here's a couple of charts to illustrate what I'm after. The green dots are the data points and the blue line is the "polynomial trendline" (of order 6). The "peak" of that trendline is what I'm after.

Example with even dataset Example with uneven dataset

Updated question:

After some answers I realize my question need to be rephrased as the problem is not really how to find the peak of the curve but more of how to generate that blue curve from the green points so I can find where in the dataset the "weight" lies. The goal is to get a sort of 'average maximum'.

I guess another question would be "what is this particular problem actually called?" ;)

A: 

Derivative is equal to zero at peaks.

Andrey
Ah right, that's another term I've forgotten since school. It's also zero at 'valleys' if I recall. But running a max(d1,d2,d3) would find me the perfect point. But now I just need to figure out how to make that curve to find the derivative on. ;)
Robert Sköld
@Robert Sköld you should refresh your math (calculus actually). Numerically derivative in point x can be calculated as f(x + 1) - f(x), so if you have points 1 2 3 3 2 1 derivatives will be 1 1 0 -1 -1. then yes, find maximum.
Andrey
A: 

Lets say you are plotting Y vs X. You already have the values of Y corresponding to each X. Let Y(X1) mean value of Y when X=X1.

Set a variable max = 0. Then calculate value of Y at each X. If Y(X1) > max then set max=Y(X). Once you go through all the Ys, what you'll have in max will be the peak value of Y.

e.g in your example just go through all green dots and find the maximum of them. That would be the peak, right? Let me know if that's what you wanted. Which programming language are you using? You don't need to go into distributions and stuff just to get the peak..

Raze2dust
Updated my question a bit, but as you can see on the second image the target would be right in between two "maximums" which is why I'd like the peak to be more 'weighted' (or whatever term is correct) which is why that trendline seems proper.And the programming language will be javascript in the end...
Robert Sköld
A: 

As you speak of normal distributions, and seem to be able to fit data to a function, you should fit to a normal distribution, which jas parameters µ and σ, which are respectively the mean and standard deviation of the distribution (see wiki first formula).

Fit this function to your data, and the peak will be at the mean value, given by µ.

rubenvb
+2  A: 

Although the data looks like that you're not necessarily after a normal distribution.

The topic of distribution fitting is quite complex and, unless you have some clear a priori assumptions of what your data distribution is, I would not venture there. In case you have assumptions on the type of distribution, have a look at least squares or maximum likelihood extimation methods.

However, I would suggest you should rather use a bezier-spline or LOESS to "smooth" your data and then just find the maximum of the computed curve.

I doubt that an approach using the derivative would work here.

nico
Also, have a look at this: http://stats.stackexchange.com/questions/1315/how-do-i-figure-out-what-kind-of-distribution-this-is
nico
Thanks, looks like interesting links!
Robert Sköld
A: 

You could start with calculating the mean and standard deviation/variance. This would tell you some information about the distribution.

I don't think you'll be able to solve the problem for an arbitrary data set. So you would need to have some common characteristic behavior.

After all, fitting a curve can be somewhat arbitrary depending upon the method - it needs to be chosen appropriately for your problem domain - perhaps there needs to be some weighting or data cleansing to throw out outlying values first.

Cade Roux
True, I will update my question with some example datasets (and maybe more of an explanation of my problem).
Robert Sköld