views:

361

answers:

2

Hello, I'm sorry if I'm not using the correct mathemathical terms, but I hope you'll understand what I'm trying to accomplish.

My problem: I'm using linear regression (currently least squares method) on the values from two vectors x and y against the result z. This is to be done in matlab, and I'm using the \-operator to perform the regression. My dataset will contain a few thousand observations (up to about 50000 at max).

The x-values will be in the area of 10-300 (most between 60 and 100) and the y-values in the 1-3 area.

My code looks like this:

X = [ones(size(x,1) x y];
parameters = X\y;

The output "parameters" are then the three factors a0, a1 and a2 which is used in this formula:

a0 * 1 + a1 * xi + a2 * yi = zi

(The i's are supposed to be subscripted)

This works like expected, although I want the two parameters a1 and a2 to ALWAYS be positive values, even when the vector z is negative (this means that the a0 will be negative, of course), since this is what the real model looks like (z is always positively correlated to x and z). Is this possible using the least squares method? I'm also open for other algorithms for linear regression.

+1  A: 

Let me try and rephrase to clarify. Accoring to your model z is always positively correlated with x and y. However, sometimes when you solve the linear regression for the coefficient this gives you a negative value.

If you are right about the data, this should only happen when the correct coefficient is small, and noise happens to take it negative. You could just assign it to zero, but then the means wouldn't match properly.

In which case the correct solution is as jpalacek says, but explained with more detail here:

  1. Try and regress against x and y. If both positive take the result.
  2. If a1 is negative, assume it should be zero. regress z against y. If a2 is positive then take a1 as 0, and a0 and a2 from this regression.
  3. If a2 is negative, assume it should be zero too. Regress z against 1, and take this as a0. Let a1 and a2 be 0.

This should give you what you want.

Nick Fortescue
+1  A: 

The simple solution is to use a tool designed to solve it. That is, use lsqlin, from the optimization toolbox. Set a lower bound constraint for two of the three parameters.

Thus, assuming x, y, and z are all COLUMN vectors,

A = [ones(length(x),1),x,y];

lb = [-inf, 0, 0];

a = lsqlin(A,z,[],[],[],[],lb);

This will constrain only the second and third unknown parameters.

Without the optimization toolbox, use lsqnonneg, which is part of matlab itself. Here too the solution is easy enough.

A = [ones(length(x),1),x,y];

a = lsqnonneg(A,z);

Your model will be

z = a(1) + a(2)x + a(3)y

If a(1) is essentially zero, i.e., it is within a tolerance of zero, then assume that the first parameter was constrained by the bound at zero. In that case, solve a second problem by changing the sign on the column of ones in A.

A(:,1) = -1;

a = lsqnonneg(A,z);

If this solution has a(1) significantly non-zero, then the second solution must be better than the first. Your model will now be

z = -a(1) + a(2)x + a(3)y

It costs you at most two calls to lsqnonneg, and the second call is only ever made some fraction (lacking any information about your problem, the odds are 50% of the second call) of the time.

woodchips