ansaurus

Question

Answer 1

+1 A:

Check out GNU Octave - between its polyfit() and the nonlinear constraints solver it ought to be possible to construct something suitable for your problem.

moonshadow 2009-08-31 16:59:06

I actually use Octave sometimes. I'll see what I can figure out.

Nosredna 2009-08-31 18:52:50

Answer 2

+5 A:

Your first model is actually linear in the three parameters and can be fit in R using

 fit <- lm(y ~ x + I(x^2), data=X)

which will get you your three parameters.

The second model can also be fit using nls() in R with the usual caveats of having to provide starting values etc. The statistical issues in optimization are not necessarily the same as the numerical issues -- you cannot just optimize any functional form no matter which language you choose.

Dirk Eddelbuettel 2009-08-31 17:03:07

Although you'd be better off with `y ~ poly(x, 2)` or `y ~ ns(x, 2)`

hadley 2009-09-01 01:01:05

Answer 3

+1 A:

You're probably not going to find a single routine with the flexibility implied in your examples (polynomials and rational functions using the same routine), let alone one that will parse a string to figure out what sort of equation to fit.

A least-squares polynomial fitter would be appropriate for your first example. (It's up to you what degree polynomial to use -- quadradic, cubic, quartic, etc.). For a rational function like your second example, you might have to "roll your own" if you can't find a suitable library. Also, keep in mind that a sufficiently high-degree polynomial can be used to approximate your "real" function, as long as you don't need to extrapolate beyond the limits of the data set you're fitting to.

As others have noted, there are other, more generalized parameter estimation algorithms which might also prove useful. But those algorithms aren't quite "plug and play": they usually require you to write some helper routines, and supply a list of initial values for the model parameters. It's possible for these sorts of algorithms to diverge, or get stuck in a local minimum or maximum for an unlucky choice of initial parameter estimates.

Jim Lewis 2009-08-31 17:04:36

When I use the commercial products, I typically have _no idea_ what will work best. LAB Fit will try several hundred equations to see what fits the data best in the range I specify.

Nosredna 2009-08-31 18:50:39

I hadn't considered that use case -- if you're in the early stages of trying to characterize a data set, it does make sense to try several families of functions (linear, polynomial, power law, periodic...) to see what a good fit might look like. I'll edit my answer accordingly.

Jim Lewis 2009-08-31 21:34:40

"It's possible for these sorts of algorithms to diverge..." Yeah, I assume the commercial programs just bail out when this happens during checking all the choices. They do let you play with initial values when you choose one expression at a time.

Nosredna 2009-08-31 22:48:08

Answer 4

+1 A:

In R, this is quite easy.

The built in method is called optim(). It takes as arguments a starting vector of potential parameters, then a function. You have to go build your own error function, but that's really simple.

Then you call it like out = optim( 1 , err_fn)

where err_fn is

err_fn = function(A) {
    diff = 0;
    for(i in 1:data_length){
      x = eckses[i];
      y = data[i];
      model_y = A*x;
      diff = diff + ( y - model_y )^2
    }
    return(diff);
}

This just assumes you have a vector of x and y values in eckses and data. Change the model_y line as you see fit, even add more parameters.

It works on nonlinear just fine, I use it for four dimensional e^x curves and it is very fast. The output data includes the error value at the end of the fitting, which is a measure of how well it fits, given as a sum of squared differences (in my err_fn).

EDIT: If you NEED to take in the model as a string, you can have your user interface construct this whole model fitting process as an R script and load it in to run. R can take text from STDIN or from a file, so it shouldn't be too hard to craft this function's string equivalent, and have it run optim automatically.

Karl 2009-08-31 17:21:17

But why not use nls() in R?

Dirk Eddelbuettel 2009-08-31 17:46:49

I dont use nls for two reasons, first, I like being able to hand craft the error function to be optimized, and second, I'm not actually all that experienced with R. So nls() does just what I wrote up there? Neat.

Karl 2009-08-31 18:50:55

My ultimate goal is to hand it a list of strings and have the code try them all out to find the best fit.

Nosredna 2009-08-31 18:51:41

Answer 5

+1 A:

if you have constraints on your coefficients, and you know that there is a specific type of function you'd want to fit to your data and that function is a messy one where standard regression methods or other curve fitting methods won't work, have you considered genetic algorithms?

they're not my first choice, but if you are trying to find the coefficients of the second function you mentioned, then perhaps GAs would work --- especially if you are using non-standard metrics to evaluate best fit. for example, if you wanted to find the coefficients of "(A+Bx+Cx^2)/(Dx+Ex^2)" such that the sum of square differences between your function and data is minimal and that there be some constraint on the arclength of the resulting function, then a stochastic algorithm might be a good way to approach this.

some caveats: 1) stochastic algorithms won't guarantee the best solution, but they will often be very close. 2) you have to be careful about the stability of the algorithm.

on a longer note, if you are at the stage where you want to find a function from some space of functions that best fits your data (e.g., you are not going to impose, say, the second model on your data), then genetic programming techniques may also help.

B Rivera 2009-09-01 01:52:54

That's an interesting idea. I'll think about that. Obviously, it would be slow. The commercial programs run through hundreds of equation forms in seconds.

Nosredna 2009-09-01 02:38:25

yes, another downside is that stochastic algorithms can be slow. on the upside though, it's possible to obtain an equation form outside of the set that commercial programs run through. by allowing a genetic program to search through *classes* of functions (with operations on those functions) such as, power functions, exponentials, logarithms, trig functions, pdfs/cdfs, etc. it's possible to find a solution not given by a fixed set of equation forms. but again on the downside, this requires a reasonable up front coding effort which may not be worth its while.

B Rivera 2009-09-01 13:34:23

I'm always up for a Quixotic adventure.

Nosredna 2009-09-01 14:20:39

Answer 6

+1 A:

To answer your question in a general sense (regarding parameter estimation in R) without considering the specifics of the equations you pointed out, I think you are looking for nls() or optim()... 'nls' is my first choice as it provides error estimates for each estimated parameter and when it fails I use 'optim'. If you have your x,y variables:

out <- tryCatch(nls( y ~ A+B*x+C*x*x, data = data.frame(x,y), 
                start = c(A=0,B=1,C=1) ) ,
                error=function(e) 
                optim( c(A=0,B=1,C=1), function(p,x,y)  
                      sum((y-with(as.list(p),A + B*x + C*x^2))^2), x=x, y=y) )

to get the coefficients, something like

getcoef <- function(x) if(class(x)=="nls") coef(x) else x$par
getcoef(out)

If you want the standard errors in the case of 'nls',

summary(out)$parameters

The help files and r-help mailing list posts contain many discussions regarding specific minimization algorithms implemented by each (the default used in each example case above) and their appropriateness for the specific form of the equation at hand. Certain algorithms can handle box constraints, and another function called constrOptim() will handle a set of linear constraints. This website may also help:

http://cran.r-project.org/web/views/Optimization.html

Stephen 2009-09-03 11:04:49

Can I feed the formula in as strings?

Nosredna 2009-09-03 14:05:52

yes - something like as.formula(paste("y","A+B*x+C*x^2",sep="~")) should do it.

Stephen 2009-09-03 17:37:58

that was in the nls case, in optim something like eval(parse(text=sprintf("sum((y-%s)^2)","A+B*x+C*x^2"))) should work (the sprintf construction is shown so you can insert the formula that you desire).

Stephen 2009-09-03 17:40:46

ansaurus

tags:

views:

answers:

Finding a curve to match data

related questions