views:

83

answers:

4

Given data values of some real-time physical process (e.g. network traffic) it is to find a name of the function which "at best" matches with the data.

I have a set of functions of type y=f(t) where y and t are real:

funcs = set([cos, tan, exp, log])

and a list of data values:

vals = [59874.141, 192754.791, 342413.392, 1102604.284, 3299017.372]

What is the simplest possible method to find a function from given set which will generate the very similar values?

PS: t is increasing starting from some positive value by almost-equal intervals

A: 

See Curve Fitting

antti.huima
No, it is not a regression analysis. I do not need interpolation, smoothing or fourier integrals: I need simple (as possible) automatic classification.
psihodelia
+1  A: 

Scipy has functions for fitting data, but they use polynomes or splines. You can use one of Gauß' many discoveries, the method of least squares to fit other functions.

THC4k
no, it is not what I need
psihodelia
@psihodelia: You definitely need to measure how close each of your functions gets to the output values. Why not use least squares for that?
jellybean
It is what you need. Interpolation, extrapolation, and parameter estimation may not be your end goals, but they are necessary steps (or by-products) along the way to classification. Ultimately, you will need to compute some distance between your input data and your best fit. The category of function that minimizes that distance is your predicted class label. But first, you need to fit.
Steve
+1  A: 

Just write the error ( quadratic sum of error at each point for instance ) for each function of the set and choose the function giving the minimum.

But you should still fit each function before choosing

fa.
+1  A: 

I would try an approach based on fitting too. For each of the four test functions (f1-4 see below), the values of a and b that minimizes the squared error.

f1(t) = a*cos(b*t)
f2(t) = a*tan(b*t)
f3(t) = a*exp(b*t)
f4(t) = a*log(b*t)

After fitting the squared error of the four functions can be used for evaluating the fit goodness (low values means a good fit).

If fitting is not allowed at all, the four functions can be divided into two distinct subgroups, repeating functions (cos and tan) and strict increasing functions (exp and log). Strict increasing functions can be identified by checking if all the given values are increasing throughout the measuring interval.

In pseudo code an algorithm could be structured like

if(vals are strictly increasing):
    % Exp or log
    if(increasing more rapidly at the end of the interval):
        % exp detected
    else:
        % log detected
else:
    % tan or cos
    if(large changes in vals over a short period is detected):
        % tan detected
    else:
        % cos detected

Be aware that this method is not that stable and will be easy to trick into faulty conclusions.

midtiby