ansaurus

Question

higher order linear regression

Answer 1

+2 A:

A wild guess: A singular value decomposition might do the trick?

Emil H 2009-05-04 17:28:13

Might be related but it is working with total knowledge of the known input.

BCS 2009-05-04 17:39:55

Answer 2

+5 A:

This problem is illposed as described.

Let A, B, and C=5, be scalars. You are asking to solve a*b=5 which has an infinite number of solutions.

One approach, on the information provided above, is to minimize the function g defined as

g(A,B) = ||AB-C||^2 = trace((AB-C)*(AB-C))^2

using Newtons method or a quasi-secant approach (BFGS).
(You can easily compute the gradient here). M* is the transpose of M and multiplication is implicit. (The norm is the frobenius norm... I removed the underscore F as it was not displaying properly)

As this is an inherently nonlinear problem, standard linear algebra approaches do not apply.

If you provide more information, I may be able to help more.

Some more questions: I think the issue is here is that without more information, there is no "best solution". We need to determine a more concrete idea of what we are looking for. One idea, could be a "sparsest" solution. This area is a hot area of research, with some of the best minds in the world working here (See Terry Tao et al. work on Nuclear Norm) This problem although tractable is still hard.

Unfortunately, I am not yet able to comment, so I will add my comments here. As said below, LM is a great approach to solving this and is just one approach. along the lines of the Newton type approaches to either the optimization problem or the nonlinear solving problem.

Here is an idea, using the example you gave above: Lets define two new vectors, V and U each with 21 elements (exactly the same number of defined elements in C).

V is precisely the known elements of C, column ordered, so (in matlab notation)

V = [C11; C21; C31; C51; C12; .... ; C55]

U is a vector which is a column ordering of the product AB, LEAVING OUT THE ELEMENTS CORRESPONDING TO '?' in matrix C. Collecting all the variables into x we have
x = [a11, a21, .. a52, b11, b21 ..., b25].

f(x) = U (as defined above).

We can now try to solve f(x)=V with your favorite nonlinear least squares method.

As an aside, although a poster below recommended simulated annealing, I recommend against it. THere are some problems it works, but it is a heuristic. When you have powerful analytic methods such as Gauss-Newton or LM, I say use them. (in my own experience that is)

SplittingField 2009-05-06 00:58:07

I'm not following you are the "inherently nonlinear" bit. I think it's no more nonlinear than a linear regression problem.

BCS 2009-05-06 19:56:51

Yeah, that was a little confusing. I added a new section to my answer above.. which I think reduces the problem to standard nonlinear least squares.

SplittingField 2009-05-07 20:45:05

Clearly there is no single best solution (row/column reordering and scaling take care of that right off) but I'd be willing to take any one of the equally best options. BTW, I'm still not seeing how this is non-linear

BCS 2009-05-09 00:16:10

Did you see the reduction to standard non-linear least squares? For example equation 1 would be a11b11+a12b21 = c11, which nonlinear in the variables. See the addeundum to my answer for the full extension.

SplittingField 2009-05-09 00:31:44

Answer 3

+1 A:

You have a couple of options. The Levenberg-Marquadt algorithm is generally recognized as the best LS method. A free implementation is available at here. However, if the calculation is fast and you have a decent number of parameters, I would strongly suggest a Monte Carlo method such as simulated annealing.

You start with some set of parameters in the answer, and then you increase one of them by a random percentage up to a maximum. You then calculate the fitness function for your system. Now, here's the trick. You don't throw away the bad answers. You accept them with a Boltzmann probability distribution.

P = exp(-(x-x0)/T)

where T is a temperature parameter and x-x0 is the current fitness value minus the previous. After x number of iterations, you decrease T by a fixed amount (this is called the cooling schedule). You then repeat this process for another random parameter. As T decreases, fewer poor solutions are chosen, and eventually the procedure becomes a "greedy search" only accepting the solutions that improve the fit. If your system has many free parameters (> 10 or so), this is really the only way to go where you will have any chance of getting to a global minimum. This fitting method takes about 20 minutes to write in code, and a couple of hours to tweak. Hope this helps.

FYI, Wolfram has a nice discussion of this in the context of the traveling salesman problem, and I've been using it very successfully to solve some very difficult global minimization problems. It is slower than LM methods, but much better in most difficult/relatively large cases.

Steve 2009-05-06 19:02:51

As it happens, I'm taking an AI final next week :b. +1

BCS 2009-05-06 19:42:59

Answer 4

A:

Based on the realization that cutting B to a single column and them removing row with unknowns converts this to very near a known problem, One approach would be to:

seed A with random values.
solve for each column of B independently.
rework the problem to allow solving for each row of A given the B values from step 2.
repeat at step 2 until things settle out.

I have no clue if that is even stable.

BCS 2009-05-09 00:27:40

Answer 5

+1 A:

I have no idea on how to deal with your missing values, so I'm going to ignore that problem.

There are no unique solutions. To find a best solution you need some sort of a metric to judge them by. I'm going to suppose you want to use a least squares metric, i.e. the best guess values of A and B are those that minimize sum of the numbers [C_ij-(A B)_ij]^2.

One thing you didn't mention is how to determine the value you are going to use for n. In short, we can come up with 'good' solutions if 1 <= n <= b. This is because 1 <= rank(span(C)) <= b. Where rank(span(C)) = the dimension of the column space of C. Note that this is assuming a >= b. To be more correct we would write 1 <= rank(span(C)) <= min(a,b).

Now, supposing that you have chosen n such that 1 <= n <= b. You are going to minimize the residual sum of squares if you chose the columns of A such that span(A) = span(First n eigen vectors of C). If you don't have any other good reasons, just choose the columns of A to be to first n eigen vectors of C. Once you have chosen A, you can get the values of B in the usual linear regression way. I.e. B = (A'A)^(-1)A' C

leif 2009-05-16 03:15:32

n is chosen outside the problem. A for the missing value bit, one solution would be to find the largest row/column section that is packed and apply your solution to it. Then look for another selection with the most unset value and repeat. Cycle this and start recomputing some of the values (selected by a heuristic) and it might converge.

BCS 2009-05-17 23:21:22

I claim that chosing n > rank(span(C)) will lead to columns of A that are linearly dependent.I'm afraid I don't understand your approach to the missing values. Are you seeding some values from the known data portion, then using that to iterate on the rest?

leif 2009-05-18 01:36:47

the real application (think NetFlix) would require a large n to cause problems. The idea for the missing values is to repeatedly solve parts of the problem that have no missing values in C and somehow ignore the value that are already solved for.

BCS 2009-05-20 00:05:48

oh, so you are going to try and predict the missing values? Good luck!

leif 2009-05-20 03:41:11

ansaurus

tags:

views:

answers:

higher order linear regression

related questions