ansaurus

Question

Answer 1

+4 A:

Yes, the way this is typically done is by using least squares. There are other ways of specifying how well a polynomial fits, but the theory is simplest for least squares. The general theory is called linear regression.

Your best bet is probably to start with Numerical Recipes.

R is free and will do everything you want and more, but it has a big learning curve.

If you have access to Mathematica, you can use the Fit function to do a least squares fit. I imagine Matlab and its open source counterpart Octave have a similar function.

John D. Cook 2008-12-19 21:06:37

This is helpful, but do you know which of these do multivariate fitting?

ShreevatsaR 2008-12-19 22:11:58

least squares can be multivariate. gil strang's "intro to applied mathematics" has a very nice, readable discussion.

duffymo 2008-12-28 15:55:13

Yes, thanks... when I asked that comment about multivariate fitting I didn't know enough about least squares :-)

ShreevatsaR 2008-12-31 23:14:40

Answer 2

+1 A:

For (x, f(x)) case:

import numpy

x = numpy.arange(10)
y = x**2

coeffs = numpy.polyfit(x, y, deg=2)
poly = numpy.poly1d(coeffs)
print poly
yp = numpy.polyval(poly, x)
print (yp-y)

J.F. Sebastian 2008-12-19 21:33:23

Answer 3

+1 A:

If you want to fit the (xi, f(xi)) to an polynomial of degree n then you would set up a linear least squares problem with the data (1, xi, xi, xi^2, ..., xi^n, f(xi) ). This will return a set of coefficients (c0, c1, ..., cn) so that the best fitting polynomial is *y = c0 + c1 * x + c2 * x^2 + ... + cn * x^n.*

You can generalize this two more than one dependent variable by including powers of y and combinations of x and y in the problem.

David Norman 2008-12-19 21:35:16

Answer 4

A:

The lagrange polynomial is in some sense the "simplest" interpolating polynomial that fits a given set of data points.

It is sometimes problematic because it can vary wildly between data points.

2008-12-19 21:40:13

Also, the Lagrange polynomial has degree n-1 with n points -- that's what I wrote in the question about polynomial interpolation -- it doesn't give the best-fit polynomial for a given degree.

ShreevatsaR 2008-12-19 21:43:23

Answer 5

A:

Not an answer--

I've always thought that for game development it would be really nice to have a program that let you draw some arbitrary (x, f(x)) style curve with the mouse, add numbers to the scales, and have it print out a formula to use for f.

Using a pure polynomial solution seems to me to have the problem that--well like I don't think it could come up with sin(x) if you input a sine wave. I could be wrong, I haven't done serious maths in decades.

Bill K 2008-12-19 21:41:49

For visual applications, this is usually done with splines -- piecewise polynomials that fit together smoothly -- rather than with polynomials. Polynomials wiggle too much when they fit data.

John D. Cook 2008-12-19 21:54:53

WEll, technically you could. Transcendental functions can be represented to any degree of accuracy with Taylor Series. However, it takes a polynomial of *infinite* degree to *exactly* match the function, so I'd probably look at sinusoidal methods for this type of problem.

stalepretzel 2008-12-21 16:36:51

Answer 6

+2 A:

Lagrange polynomials (as @j w posted) give you an exact fit at the points you specify, but with polynomials of degree more than say 5 or 6 you can run into numerical instability.

Least squares gives you the "best fit" polynomial with error defined as the sum of squares of the individual errors. (take the distance along the y-axis between the points you have and the function that results, square them, and sum them up) The MATLAB polyfit function does this, and with multiple return arguments, you can have it automatically take care of scaling/offset issues (e.g. if you have 100 points all between x=312.1 and 312.3, and you want a 6th degree polynomial, you're going to want to calculate u = (x-312.2)/0.1 so the u-values are distributed between -1 and +=).

NOTE that the results of least-squares fits are strongly influenced by the distribution of x-axis values. If the x-values are equally spaced, then you'll get larger errors at the ends. If you have a case where you can choose the x values and you care about the maximum deviation from your known function and an interpolating polynomial, then the use of Chebyshev polynomials will give you something that is close to the perfect minimax polynomial (which is very hard to calculate). This is discussed at some length in Numerical Recipes.

Edit: From what I gather, this all works well for functions of one variable. For multivariate functions it is likely to be much more difficult if the degree is more than, say, 2. I did find a reference on Google Books.

Jason S 2008-12-19 22:38:26

Thanks for the reference, BTW. The relevant stuff is a few pages later; 4.10.4 on page 231. The same thing works for multivariate polynomials of higher degree too without great difficulty, although there is the concern of "overfitting".

ShreevatsaR 2008-12-21 16:17:41

Answer 7

+12 A:

ShreevatsaR 2008-12-20 08:45:51

Chebyshev nodes work for least squares.

Jason S 2008-12-31 23:57:40

@Jason:Are you sure that Chebyshev nodes are known to be the best points to choose even for least squares? [It seems there is the different question of picking Chebyshev polynomials themselves as the approximating polynomials, for a different kind of fit from least squares -- "minimax" polynomial.]

ShreevatsaR 2009-01-01 00:44:39

df/dx = 0 does not necessarily mean that f is minimized, it might also be maximized.

quant_dev 2009-02-20 21:11:28

True, but that's not what we're saying here. We're saying that (assuming the function has partial derivatives etc.), any minimum occurs (either on the boundary or) at a point where the gradient is 0.

ShreevatsaR 2009-02-20 21:34:29

about your concerns on numerical stability: defining a polynomial (= "linear combination of monomials") is a dangerous thing to do, because (saying it in non mathematical words) monomials (say) above degree 4 are terribly similar to each other in the area around 0 and then they just "grow crazy". a much better thing is to decide in which interval you are attempting to fit the data, redefine your independent variable so that you are actually fitting in (-1, 1), and look for a linear combination of good polynomials instead of monomials. I would use the Chebyshev set.

mariotomo 2009-07-27 17:44:17

@mariotomo: Thanks, that makes sense after you say it :) Good point.

ShreevatsaR 2009-07-28 12:47:21

Answer 8

+1 A:

Bare in mind that a polynomial of higher degree ALWAYS fits the data better. Polynomials of higher degree typically leads to highly improbable functions (see Occam's Razor), though (overfitting). You want to find a balance between simplicity (degree of polynomial) and fit (e.g. least square error). Quantitatively, there are tests for this, the Akaike Information Criterion or the Bayesian Information Criterion. These tests give a score which model is to be prefered.

Fredriku73 2008-12-20 09:52:55

Right, I realised after a while that there was some trade-off between fit and simplicity, as you said. Thanks for the information about the criteria.

ShreevatsaR 2008-12-20 16:39:36

Answer 9

A:

It's called approximation theory, a vast field in numerical analysis, well researched.

2008-12-20 09:56:50

Sure it's probably a vast ocean, but it would be nice to know *something* from it. :-)

ShreevatsaR 2008-12-20 16:34:23

Answer 10

A:

Remember, there's a big difference between approximating the polynomial and finding an exact one.

For example, if I give you 4 points, you could

Approximate a line with a method like least squares
Approximate a parabola with a method like least squares
Find an exact cubic function through these four points.

Be sure to select the method that's right for you!

stalepretzel 2008-12-21 16:34:25

Yes, I know :-) That is why I mentioned "polynomial interpolation" in the question, which is for finding an exact cubic through four points, or an exact n-th degree polynomial through n+1 points.

ShreevatsaR 2008-12-21 17:17:07

Answer 11

A:

Some theory:

Vandermonde matrix: http://en.wikipedia.org/wiki/Vandermonde_matrix

Moore-Penrose pseudoinverse: http://en.wikipedia.org/wiki/Moore-Penrose_pseudoinverse

Tony Arkles 2008-12-21 19:53:49

Answer 12

A:

Here's a VB6 code library that might be of use.

VB6 Polynomial Regression

2008-12-21 20:32:23

Answer 13

A:

It's rather easy to scare up a quick fit using Excel's matrix functions if you know how to represent the least squares problem as a linear algebra problem. (That depends on how reliable you think Excel is as a linear algebra solver.)

duffymo 2008-12-28 15:57:24

Answer 14

+1 A:

at college we had this book which I still find extremely useful: Conte, de Boor; elementary numerical analysis; Mc Grow Hill. The relevant paragraph is 6.2: Data Fitting.
example code comes in FORTRAN, and the listings are not very readable either, but the explanations are deep and clear at the same time. you end up understanding what you are doing, not just doing it (as is my experience of Numerical Recipes).
I usually start with Numerical Recipes but for things like this I quickly have to grab Conte-de Boor.

maybe better posting some code... it's a bit stripped down, but the most relevant parts are there. it relies on numpy, obviously!

def Tn(n, x):
  if n==0:
    return 1.0
  elif n==1:
    return float(x)
  else:
    return (2.0 * x * Tn(n - 1, x)) - Tn(n - 2, x)

class ChebyshevFit:

  def __init__(self):
    self.Tn = Memoize(Tn)

  def fit(self, data, degree=None):
    """fit the data by a 'minimal squares' linear combination of chebyshev polinomials.

    cfr: Conte, de Boor; elementary numerical analysis; Mc Grow Hill (6.2: Data Fitting)
    """

    if degree is None:
      degree = 5

    data = sorted(data)
    self.range = start, end = (min(data)[0], max(data)[0])
    self.halfwidth = (end - start) / 2.0
    vec_x = [(x - start - self.halfwidth)/self.halfwidth for (x, y) in data]
    vec_f = [y for (x, y) in data]

    mat_phi = [numpy.array([self.Tn(i, x) for x in vec_x]) for i in range(degree+1)]
    mat_A = numpy.inner(mat_phi, mat_phi)
    vec_b = numpy.inner(vec_f, mat_phi)

    self.coefficients = numpy.linalg.solve(mat_A, vec_b)
    self.degree = degree

  def evaluate(self, x):
    """use Clenshaw algorithm

    http://en.wikipedia.org/wiki/Clenshaw_algorithm
    """

    x = (x-self.range[0]-self.halfwidth) / self.halfwidth

    b_2 = float(self.coefficients[self.degree])
    b_1 = 2 * x * b_2 + float(self.coefficients[self.degree - 1])

    for i in range(2, self.degree):
      b_1, b_2 = 2.0 * x * b_1 + self.coefficients[self.degree - i] - b_2, b_1
    else:
      b_0 = x*b_1 + self.coefficients[0] - b_2

    return b_0

mariotomo 2009-07-28 12:39:42

Thanks again; that's quite clear. Why is it good to normalise the range to (-1,1), BTW?

ShreevatsaR 2009-07-28 12:52:54

because that's the range where the chebyshev polynomials are good behaving. in fact inside that range you can characterize them this way: T_n(x) = cos(n*acos(x)). this formula has no meaning for x not in (-1, 1).

mariotomo 2009-07-28 14:39:07

I have tested my module against numpy.polyfit (just that single example from the page you point to) and I was a bit surprised seeing that my fit matches the numpy.polyfit (even for extrapolation) up to the 15th digit. I should try more bad conditioned cases... if they still match then maybe numpy uses chebyshev polynomials behind the scenes and returns the corresponding monomial coefficients...

mariotomo 2009-07-28 14:45:20

ansaurus

tags:

views:

answers:

Fitting polynomials to data

related questions