views:

336

answers:

5

I have the following sparse matrix A.

   2   3   0   0   0
   3   0   4   0   6
   0  -1  -3   2   0
   0   0   1   0   0
   0   4   2   0   1

Then I would like to capture the following information from there:

  1. cumulative count of entries, as matrix is scanned columnwise. Yielding:

    Ap = [ 0, 2, 5, 9, 10, 12 ];

  2. row indices of entries, as matrix is scanned columnwise. Yielding:

    Ai = [0, 1, 0, 2, 4, 1, 2, 3, 4, 2, 1, 4 ];

  3. Non-zero matrix entries, as matrix is scanned columnwise. Yielding:

    Ax = [2, 3, 3, -1, 4, 4, -3, 1, 2, 2, 6, 1];

Since the actual matrix A is potentially very2 large, is there any efficient way in Perl that can capture those elements? Especially without slurping all matrix A into RAM.

I am stuck with the following code. Which doesn't give what I want.

use strict;
use warnings;

my (@Ax, @Ai, @Ap) = ();
while (<>) {
    chomp;
    my @elements = split /\s+/;
    my $i = 0;
    my $new_line = 1;
    while (defined(my $element = shift @elements)) {
        $i++;
        if ($element) {
            push @Ax, 0 + $element;
            if ($new_line) {
                push @Ai, scalar @Ax;
                $new_line = 0;
            }

            push @Ap, $i;
        }
    }
}
push @Ai, 1 + @Ax;
print('@Ax  = [', join(" ", @Ax), "]\n");
print('@Ai = [', join(" ", @Ai), "]\n");
print('@Ap = [', join(" ", @Ap), "]\n");
A: 

Ap is easy: simply start with zeroes and increment each time you meet a nonzero number. I don't see you trying to write anything into @Ap, so it's no surprise it doesn't end up as you wish.

Ai and Ax are trickier: you want a columnwise ordering while you're scanning rowwise. You won't be able to do anything in-place since you don't know yet how many elements the columns will yield, so you can't know in advance the elements' position.

Obviously, it would be a hell lot easier if you could just alter the requirement to have a rowwise ordering instead. Failing that, you could get complex and collect (i, j, x) triplets. While collecting, they'd naturally be ordered by (i, j). Post-collection, you'd just want to sort them by (j, i).

JB
A: 

The code you provided works on a row-by-row basis. To get results sequential by columns you have to accumulate your values into separate arrays, one for each column:

# will look like ([], [], [] ...), one [] for each column.
my @columns;

while (<MATRIX>) {
    my @row = split qr'\s+';
    for (my $col = 0; $col < @row; $col++) {

        # push each non-zero value into its column
        push @{$columns[$col]}, $row[$col] if $row[$col] > 0;

    }
}

# now you only need to flatten it to get the desired kind of output:
use List::Flatten;
@non_zero = flat @columns;

See also List::Flatten.

Inshallah
+1  A: 

This is what you are looking for, I guess:

#!/usr/bin/perl

use strict;
use warnings;

use Data::Dumper::Simple;

my @matrix;

# Populate @matrix
while (<>) {
    push @matrix, [ split /\s+/ ];
}

my $columns = @{ $matrix[0] };
my $rows    = @matrix;

my ( @Ap, @Ai, @Ax );
my $ap = 0;

for ( my $j = 0 ; $j <= $rows ; $j++ ) {
    for ( my $i = 0 ; $i <= $columns ; $i++ ) {
        if ( $matrix[$i]->[$j] ) {
            $ap++;
            push @Ai, $i;
            push @Ax, $matrix[$i]->[$j];
        }
    }
    push @Ap, $ap;
}

print Dumper @Ap;
print Dumper @Ai;
print Dumper @Ax;
Alan Haggai Alavi
@AHA: Thanks. But your approach still slurp the whole matrix into @matrix. In practice matrix dimension is 10^6. I think my RAM can't afford it.
neversaint
@foolishbrat: Oh I see.@Inshalla: Sorry, I did not understand.
Alan Haggai Alavi
+1  A: 

A common strategy for storing sparse data is to drop the values you don't care about (the zeroes) and to store the row and column indexes with each value that you do care about, thus preserving their positional information:

[VALUE, ROW, COLUMN]

In your case, you can economize further since all of your needs can be met by processing the data column-by-column, which means we don't have to repeat COLUMN for every value.

use strict;
use warnings;
use Data::Dumper;

my ($r, $c, @dataC, @Ap, @Ai, @Ax, $cumul);

# Read data row by row, storing non-zero values by column.
#    $dataC[COLUMN] = [
#        [VALUE, ROW],
#        [VALUE, ROW],
#        etc.
#    ]
$r = -1;
while (<DATA>) {
    chomp;
    $r ++;
    $c = -1;
    for my $v ( split '\s+', $_ ){
        $c ++;
        push @{$dataC[$c]}, [$v, $r] if $v;
    }
}

# Iterate through the data column by column
# to compute the three result arrays.
$cumul = 0;
@Ap = ($cumul);
$c = -1;
for my $column (@dataC){
    $c ++;
    $cumul += @$column;
    push @Ap, $cumul;
    for my $value (@$column){
        push @Ax, $value->[0];
        push @Ai, $value->[1];
    }
}

__DATA__
2   3   0   0   0
3   0   4   0   6
0  -1  -3   2   0
0   0   1   0   0
0   4   2   0   1
FM
+1  A: 

Updated based on FM's comment. If you do not want to store any of the original data:

#!/usr/bin/perl

use strict;
use warnings;

my %matrix_info;

while ( <DATA> ) {
    chomp;
    last unless /[0-9]/;
    my @v = map {0 + $_ } split;
    for (my $i = 0; $i < @v; ++$i) {
        if ( $v[$i] ) {
            push @{ $matrix_info{$i}->{indices} }, $. - 1;
            push @{ $matrix_info{$i}->{nonzero} }, $v[$i];
        }
    }
}

my @cum_count = (0);
my @row_indices;
my @nonzero;

for my $i ( sort {$a <=> $b } keys %matrix_info ) {
    my $mi = $matrix_info{$i};
    push @nonzero, @{ $mi->{nonzero} };

    my @i = @{ $mi->{indices} };

    push @cum_count, $cum_count[-1] + @i;
    push @row_indices, @i;
}

print(
    "\@Ap = [@cum_count]\n",
    "\@Ai = [@row_indices]\n",
    "\@Ax = [@nonzero]\n",
);

__DATA__
2   3   0   0   0
3   0   4   0   6
0  -1  -3   2   0
0   0   1   0   0
0   4   2   0   1

Output:

C:\Temp> m
@Ap = [0 2 5 9 10 12]
@Ai = [0 1 0 2 4 1 2 3 4 2 1 4]
@Ax = [2 3 3 -1 4 4 -3 1 2 2 6 1]
Sinan Ünür
Perhaps I'm misunderstanding something, but your first answer gives the impression that `@cum_count` is like the OP's cumulative count (`@Ap`), but it is not. Your counts are row-wise tallies; he wants column-wise. That said, the tallies the OP needs are available in `%row_indices` (for example, `scalar @{$row_indices{0}}`), so the answer is still a good one.
FM
@FM Ah! Good catch!
Sinan Ünür