Large-scale pseudoinverse | ansaurus

tags:

views:

118

answers:

1

+6 Q:

Large-scale pseudoinverse

I would like to compute the Moore–Penrose pseudoinverse of an enormous matrix. Ideally, I would like to do it on a matrix that has 23 million rows and 1000 columns, but if necessary I can reduce the number of rows to 4 million by only running on one part of my experiment.

Obviously, loading the matrix in to memory and running SVD on it is not going to work. Wikipedia points to Krylov subspace methods and mention the Arnoldi, Lanczos, Conjugate gradient, GMRES (generalized minimum residual), BiCGSTAB (biconjugate gradient stabilized), QMR (quasi minimal residual), TFQMR (transpose-free QMR), and MINRES (minimal residual) methods as being among the best Krylov subspace methods. But I don't know where to go from here. Is computing the pseudoinverse of such a huge matrix even feasible? If so, using which algorithms or software libraries? I have a large computing cluster available, so parallel approaches are welcome.

This answer points to the R package biglm. Would that work? Has anyone used it? I normally work in Python, but don't mind using other languages and tools for this particular task.

+1 A:

You might be better off using a block iterative algorithm that converges directly to the least squares solution than computing the least squares solution through the pseudoinverse. See "Applied Iterative Methods" by Charlie Byrne. These algorithms are closely related to the Krylov subspace methods, but are tuned for easy computation. You can get an introduction by looking at chapter 3 of this preprint of another of his books.

deinst 2010-07-22 01:16:39

related questions

How do I find the Excel column name that corresponds to a given integer?

Calculating a cutting list with the least amount of off cut waste.

Red-Black Trees

How to maintain a recursive invariant in a MySQL database?

RFC calculation in Java need help with algorithm

Best word wrap algorithm?

How do you separate game logic from display?

Most effective way for float and double comparison

Choosing a multiplier for a (string) hash function

Optimizing a search algorithm in C

Find the best combination from a given set of multiple sets

What "already invented" algorithm did you invent?

Designing a Calendar system like Google Calendar

How to overload std::swap()

Looking for algorithm that reverses the sprintf() function output

Merge Sort a Linked List

Puzzle: Find largest rectangle (maximal rectangle problem)

graph serialization

Peak detection of measured signal

Big O, how do you calculate/approximate it?

What problems can be solved, or tackled more easily, using graphs and trees?

Followup: "Sorting" colors by distinctiveness

Efficiently get sorted sums of a sorted list

Function for creating color wheels

Fastest way to get value of pi