views:

95

answers:

2

I need to fit some points from different datasets with straight lines. From every dataset I want to fit a line. So I got the parameters ai and bi that describe the i-line: ai + bi*x. The problem is that I want to impose that every ai are equal because I want the same intercepta. I found a tutorial here: http://www.scipy.org/Cookbook/FittingData#head-a44b49d57cf0165300f765e8f1b011876776502f. The difference is that I don't know a priopri how many dataset I have. My code is this:

from numpy import *
from scipy import optimize

# here I have 3 dataset, but in general I don't know how many dataset are they
ypoints = [array([0, 2.1, 2.4]),    # first dataset, 3 points
           array([0.1, 2.1, 2.9]),  # second dataset
           array([-0.1, 1.4])]      # only 2 points

xpoints = [array([0, 2, 2.5]),      # first dataset
           array([0, 2, 3]),        # second, also x coordinates are different
           array([0, 1.5])]         # the first coordinate is always 0

fitfunc = lambda a, b, x: a + b * x
errfunc = lambda p, xs, ys: array([ yi - fitfunc(p[0], p[i+1], xi) 
                                    for i, (xi,yi) in enumerate(zip(xs, ys)) ])


p_arrays = [r_[0.]] * len(xpoints)
pinit = r_[[ypoints[0][0]] + p_arrays]
fit_parameters, success = optimize.leastsq(errfunc, pinit, args = (xpoints, ypoints))

I got

Traceback (most recent call last):
  File "prova.py", line 19, in <module>
    fit_parameters, success = optimize.leastsq(errfunc, pinit, args = (xpoints,    ypoints))
  File "/usr/lib64/python2.6/site-packages/scipy/optimize/minpack.py", line 266, in  leastsq
    m = check_func(func,x0,args,n)[0]
  File "/usr/lib64/python2.6/site-packages/scipy/optimize/minpack.py", line 12, in  check_func
    res = atleast_1d(thefunc(*((x0[:numinputs],)+args)))
  File "prova.py", line 14, in <lambda>
    for i, (xi,yi) in enumerate(zip(xs, ys)) ])
ValueError: setting an array element with a sequence.
+1  A: 

(Side note: use def, not lambda assigned to a name -- that's utterly silly and has nothing but downsides, lambda's only use is making anonymous functions!).

Your errfunc should return a sequence (array or otherwise) of floating point numbers, but it's not, because you're trying to put as the items of your arrays the arrays which are the differences each y point (remember, ypoints aka ys is a list of arrays!) and the fit functions' results. So you need to "collapse" the expression yi - fitfunc(p[0], p[i+1], xi) to a single floating point number, e.g. norm(yi - fitfunc(p[0], p[i+1], xi)).

Alex Martelli
I can't understand the differences between the example in the link
wiso
@Wiso, that code is pretty obscure (with an undocumented `r_`, etc) but I'm sure it ends up returning from errfunc a sequence of floating point numbers, as that's what the docs require -- check with a debugger or instrument it with `print`s to verify, if you can successfully run that code yourself (I can't -- what's `r_`?!).
Alex Martelli
ok, I solved. Usually scipy is well documented, the `r_` function here: http://docs.scipy.org/doc/numpy/reference/generated/numpy.r_.htmlThe solution is to change `array` to `concatenate` to obtain an unique array as you said. In particular I got the coefficients: `[ 0.01947973 0.98656987 0.98481549 0.92034684]` and they seem to be corrects
wiso
+1  A: 

if you just need a linear fit, then it is better to estimate it with linear regression instead of a non-linear optimizer. More fit statistics could be obtained be using scikits.statsmodels instead.

import numpy as np
from numpy import array

ypoints = np.r_[array([0, 2.1, 2.4]),    # first dataset, 3 points
           array([0.1, 2.1, 2.9]),  # second dataset
           array([-0.1, 1.4])]      # only 2 points

xpoints = [array([0, 2, 2.5]),      # first dataset
           array([0, 2, 3]),        # second, also x coordinates are different
           array([0, 1.5])]         # the first coordinate is always 0

xp = np.hstack(xpoints)
indicator = []
for i,a in enumerate(xpoints):
    indicator.extend([i]*len(a))

indicator = np.array(indicator)


x = xp[:,None]*(indicator[:,None]==np.arange(3)).astype(int)
x = np.hstack((np.ones((xp.shape[0],1)),x))

print np.dot(np.linalg.pinv(x), ypoints)
# [ 0.01947973  0.98656987  0.98481549  0.92034684]

The matrix of regressors has a common intercept, but different columns for each dataset:

>>> x
array([[ 1. ,  0. ,  0. ,  0. ],
       [ 1. ,  2. ,  0. ,  0. ],
       [ 1. ,  2.5,  0. ,  0. ],
       [ 1. ,  0. ,  0. ,  0. ],
       [ 1. ,  0. ,  2. ,  0. ],
       [ 1. ,  0. ,  3. ,  0. ],
       [ 1. ,  0. ,  0. ,  0. ],
       [ 1. ,  0. ,  0. ,  1.5]])
thanks, I like it. The problem is that now I need to use errors, I mean: y point have errors, and I need to weight them with 1/error^2. How can I do it with your code?
wiso
The best would really be to use scikits.statsmodels because for this case you all the estimation, prediction and result statistics premade.http://pypi.python.org/pypi/scikits.statsmodels/0.2.0 and linksto get predicted yparams = np.dot(np.linalg.pinv(x), ypoints)ypred = np.dot(x, params)errors = ypoints - ypred...If you mean by weighing the errors, to uses weighted least squares, then both x and ypoints need to be divided by error standard deviation, or use WLS class in scikits.statsmodels.