The problem with your current algorithm is that you are trying to shuffle your way out of dead ends -- specifically, when your @letters
and @numbers
arrays (after the initial shuffle of @numbers
) yield the same cell more than once. That approach works when the matrix is small, because it doesn't take too many tries to find a viable re-shuffle. However, it's a killer when the lists are big. Even if you could hunt for alternatives more efficiently -- for example, trying permutations rather than random shuffling -- the approach is probably doomed.
Rather than shuffling entire lists, you might tackle the problem by making small modifications to an existing matrix.
For example, let's start with your example matrix (call it M1). Randomly pick one cell to change (say, A1). At this point the matrix is in an illegal state. Our goal will be to fix it in the minimum number of edits -- specifically 3 more edits. You implement these 3 additional edits by "walking" around the matrix, with each repair of a row or column yielding another problem to be solved, until you have walked full circle (err ... full rectangle).
For example, after changing A1 from 0 to 1, there are 3 ways to walk for the next repair: A3, B1, and C1. Let's decide that the 1st edit should fix rows. So we pick A3. On the second edit, we will fix the column, so we have choices: B3 or C3 (say, C3). The final repair offers only one choice (C1), because we need to return to the column of our original edit. The end result is a new, valid matrix.
Orig Change A1 Change A3 Change C3 Change C1
M1 M2
1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
----- ----- ----- ----- -----
A | 0 0 1 1 0 1 1 0 0 1 0 0 1 0 0
B | 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0
C | 1 0 0 1 0 0 1 0 0 1 0 1 0 0 1
If an editing path leads to a dead end, you backtrack. If all of the repair paths fail, the initial edit can be rejected.
This approach will generate new, valid matrixes quickly. It will not necessarily produce random outcomes: M1 and M2 will still be highly correlated with each other, a point that will become more directly evident as the size of the matrix grows.
How do you increase the randomness? You mentioned that most cells (99% or more) are zeros. One idea would be to proceed like this: for each 1 in the matrix, set its value to 0 and then repair the matrix using the 4-edit method outlined above. In effect, you would be moving all of the ones to new, random locations.
Here is an illustration. There are probably further speed optimizations in here, but this approach yielded 10 new 600x600 matrixes, at 0.5% density, in 30 seconds or so on my Windows box. Don't know if that's fast enough.
use strict;
use warnings;
# Args: N rows, N columns, density, N iterations.
main(@ARGV);
sub main {
my $n_iter = pop;
my $matrix = init_matrix(@_);
print_matrix($matrix);
for my $n (1 .. $n_iter){
warn $n, "\n"; # Show progress.
edit_matrix($matrix);
print_matrix($matrix);
}
}
sub init_matrix {
# Generate initial matrix, given N of rows, N of cols, and density.
my ($rows, $cols, $density) = @_;
my @matrix;
for my $r (1 .. $rows){
push @matrix, [ map { rand() < $density ? 1 : 0 } 1 .. $cols ];
}
return \@matrix;
}
sub print_matrix {
# Dump out a matrix for checking.
my $matrix = shift;
print "\n";
for my $row (@$matrix){
my @vals = map { $_ ? 1 : ''} @$row;
print join("\t", @vals), "\n";
}
}
sub edit_matrix {
# Takes a matrix and moves all of the non-empty cells somewhere else.
my $matrix = shift;
my $move_these = cells_to_move($matrix);
for my $cell (@$move_these){
my ($i, $j) = @$cell;
# Move the cell, provided that the cell hasn't been moved
# already and the subsequent edits don't lead to a dead end.
$matrix->[$i][$j] = 0
if $matrix->[$i][$j]
and other_edits($matrix, $cell, 0, $j);
}
}
sub cells_to_move {
# Returns a list of non-empty cells.
my $matrix = shift;
my $i = -1;
my @cells = ();
for my $row (@$matrix){
$i ++;
for my $j (0 .. @$row - 1){
push @cells, [$i, $j] if $matrix->[$i][$j];
}
}
return \@cells;
}
sub other_edits {
my ($matrix, $cell, $step, $last_j) = @_;
# We have succeeded if we've already made 3 edits.
$step ++;
return 1 if $step > 3;
# Determine the roster of next edits to fix the row or
# column total upset by our prior edit.
my ($i, $j) = @$cell;
my @fixes;
if ($step == 1){
@fixes =
map { [$i, $_] }
grep { $_ != $j and not $matrix->[$i][$_] }
0 .. @{$matrix->[0]} - 1
;
shuffle(\@fixes);
}
elsif ($step == 2) {
@fixes =
map { [$_, $j] }
grep { $_ != $i and $matrix->[$_][$j] }
0 .. @$matrix - 1
;
shuffle(\@fixes);
}
else {
# On the last edit, the column of the fix must be
# the same as the column of the initial edit.
@fixes = ([$i, $last_j]) unless $matrix->[$i][$last_j];
}
for my $f (@fixes){
# If all subsequent fixes succeed, we are golden: make
# the current fix and return true.
if ( other_edits($matrix, [@$f], $step, $last_j) ){
$matrix->[$f->[0]][$f->[1]] = $step == 2 ? 0 : 1;
return 1;
}
}
# Failure if we get here.
return;
}
sub shuffle {
my $array = shift;
my $i = scalar(@$array);
my $j;
for (@$array ){
$i --;
$j = int rand($i + 1);
@$array[$i, $j] = @$array[$j, $i] unless $i == $j;
}
}