views:

384

answers:

5

I have the matrix system:

A x B = C

A is a by n and B is n by b. Both A and B are unknown but I have partial information about C (I have some values in it but not all) and n is picked to be small enough that the system is expected to be over constrained. It is not required that all rows in A or columns in B are over constrained.

I'm looking for something like least squares linear regression to find a best fit for this system (Note: I known there will not be a single unique solution but all I want is one of the best solutions)


To make a concrete example; all the a's and b's are unknown, all the c's are known, and the ?'s are ignored. I want to find a least squares solution only taking into account the know c's.

[ a11, a12 ]                                     [ c11, c12, c13, c14, ?   ]
[ a21, a22 ]   [ b11, b12, b13, b14, b15]        [ c21, c22, c23, c24, c25 ]
[ a31, a32 ] x [ b21, b22, b23, b24, b25] = C ~= [ c31, c32, c33, ?,   c35 ]
[ a41, a42 ]                                     [ ?,   ?,   c43, c44, c45 ]
[ a51, a52 ]                                     [ c51, c52, c53, c54, c55 ]


Note that if B is trimmed to b11 and b21 only and the unknown row 4 chomped out, then this is almost a standard least squares linear regression problem.

+2  A: 

A wild guess: A singular value decomposition might do the trick?

Emil H
Might be related but it is working with total knowledge of the known input.
BCS
+5  A: 

This problem is illposed as described.

Let A, B, and C=5, be scalars. You are asking to solve a*b=5 which has an infinite number of solutions.

One approach, on the information provided above, is to minimize the function g defined as

g(A,B) = ||AB-C||^2 = trace((AB-C)*(AB-C))^2

using Newtons method or a quasi-secant approach (BFGS).
(You can easily compute the gradient here). M* is the transpose of M and multiplication is implicit. (The norm is the frobenius norm... I removed the underscore F as it was not displaying properly)

As this is an inherently nonlinear problem, standard linear algebra approaches do not apply.

If you provide more information, I may be able to help more.


Some more questions: I think the issue is here is that without more information, there is no "best solution". We need to determine a more concrete idea of what we are looking for. One idea, could be a "sparsest" solution. This area is a hot area of research, with some of the best minds in the world working here (See Terry Tao et al. work on Nuclear Norm) This problem although tractable is still hard.


Unfortunately, I am not yet able to comment, so I will add my comments here. As said below, LM is a great approach to solving this and is just one approach. along the lines of the Newton type approaches to either the optimization problem or the nonlinear solving problem.

Here is an idea, using the example you gave above: Lets define two new vectors, V and U each with 21 elements (exactly the same number of defined elements in C).

V is precisely the known elements of C, column ordered, so (in matlab notation)

V = [C11; C21; C31; C51; C12; .... ; C55]

U is a vector which is a column ordering of the product AB, LEAVING OUT THE ELEMENTS CORRESPONDING TO '?' in matrix C. Collecting all the variables into x we have
x = [a11, a21, .. a52, b11, b21 ..., b25].

f(x) = U (as defined above).

We can now try to solve f(x)=V with your favorite nonlinear least squares method.

As an aside, although a poster below recommended simulated annealing, I recommend against it. THere are some problems it works, but it is a heuristic. When you have powerful analytic methods such as Gauss-Newton or LM, I say use them. (in my own experience that is)

SplittingField
I'm not following you are the "inherently nonlinear" bit. I think it's no more nonlinear than a linear regression problem.
BCS
Yeah, that was a little confusing. I added a new section to my answer above.. which I think reduces the problem to standard nonlinear least squares.
SplittingField
Clearly there is no single best solution (row/column reordering and scaling take care of that right off) but I'd be willing to take any one of the equally best options. BTW, I'm still not seeing how this is non-linear
BCS
Did you see the reduction to standard non-linear least squares? For example equation 1 would be a11b11+a12b21 = c11, which nonlinear in the variables. See the addeundum to my answer for the full extension.
SplittingField
+1  A: 

You have a couple of options. The Levenberg-Marquadt algorithm is generally recognized as the best LS method. A free implementation is available at here. However, if the calculation is fast and you have a decent number of parameters, I would strongly suggest a Monte Carlo method such as simulated annealing.

You start with some set of parameters in the answer, and then you increase one of them by a random percentage up to a maximum. You then calculate the fitness function for your system. Now, here's the trick. You don't throw away the bad answers. You accept them with a Boltzmann probability distribution.

P = exp(-(x-x0)/T)

where T is a temperature parameter and x-x0 is the current fitness value minus the previous. After x number of iterations, you decrease T by a fixed amount (this is called the cooling schedule). You then repeat this process for another random parameter. As T decreases, fewer poor solutions are chosen, and eventually the procedure becomes a "greedy search" only accepting the solutions that improve the fit. If your system has many free parameters (> 10 or so), this is really the only way to go where you will have any chance of getting to a global minimum. This fitting method takes about 20 minutes to write in code, and a couple of hours to tweak. Hope this helps.

FYI, Wolfram has a nice discussion of this in the context of the traveling salesman problem, and I've been using it very successfully to solve some very difficult global minimization problems. It is slower than LM methods, but much better in most difficult/relatively large cases.

Steve
As it happens, I'm taking an AI final next week :b. +1
BCS
A: 

Based on the realization that cutting B to a single column and them removing row with unknowns converts this to very near a known problem, One approach would be to:

  1. seed A with random values.
  2. solve for each column of B independently.
  3. rework the problem to allow solving for each row of A given the B values from step 2.
  4. repeat at step 2 until things settle out.

I have no clue if that is even stable.

BCS
+1  A: 

I have no idea on how to deal with your missing values, so I'm going to ignore that problem.

There are no unique solutions. To find a best solution you need some sort of a metric to judge them by. I'm going to suppose you want to use a least squares metric, i.e. the best guess values of A and B are those that minimize sum of the numbers [C_ij-(A B)_ij]^2.

One thing you didn't mention is how to determine the value you are going to use for n. In short, we can come up with 'good' solutions if 1 <= n <= b. This is because 1 <= rank(span(C)) <= b. Where rank(span(C)) = the dimension of the column space of C. Note that this is assuming a >= b. To be more correct we would write 1 <= rank(span(C)) <= min(a,b).

Now, supposing that you have chosen n such that 1 <= n <= b. You are going to minimize the residual sum of squares if you chose the columns of A such that span(A) = span(First n eigen vectors of C). If you don't have any other good reasons, just choose the columns of A to be to first n eigen vectors of C. Once you have chosen A, you can get the values of B in the usual linear regression way. I.e. B = (A'A)^(-1)A' C

leif
n is chosen outside the problem. A for the missing value bit, one solution would be to find the largest row/column section that is packed and apply your solution to it. Then look for another selection with the most unset value and repeat. Cycle this and start recomputing some of the values (selected by a heuristic) and it might converge.
BCS
I claim that chosing n > rank(span(C)) will lead to columns of A that are linearly dependent.I'm afraid I don't understand your approach to the missing values. Are you seeding some values from the known data portion, then using that to iterate on the rest?
leif
the real application (think NetFlix) would require a large n to cause problems. The idea for the missing values is to repeatedly solve parts of the problem that have no missing values in C and somehow ignore the value that are already solved for.
BCS
oh, so you are going to try and predict the missing values? Good luck!
leif