Given a large sparse matrix (say 10k+ by 1M+) I need to find a subset, not necessarily continuous, of the rows and columns that form a dense matrix (all non-zero elements). I want this sub matrix to be as large as possible (not the largest sum, but the largest number of elements) within some aspect ratio constraints.
Are there any known exact or aproxamate solutions to this problem?
A quick scan on Google seems to give a lot of close-but-not-exactly results. What terms should I be looking for?
edit: Just to clarify; the sub matrix need not be continuous. In fact the row and column order is completely arbitrary so adjacency is completely irrelevant.
A thought based on Chad Okere's idea
- Order the rows from largest count to smallest count (not necessary but might help perf)
- Select two rows that have a "large" overlap
- Add all other rows that won't reduce the overlap
- Record that set
- Add whatever row reduces the overlap by the least
- Repeat at #3 until the result gets to small
- Start over at #2 with a different starting pair
- Continue until you decide the result is good enough