views:

155

answers:

4

I am currently trying to pass a matrix file that is 32 by 48 to a multi-dimensional array in Perl. I am able to access all of the values but am having issues accessing a specific value. I need to run specific statistics on each of the values, calculate zscore, subtract the mean from each value, penalize specific values et cetera. I tried to incorporate a a solution found from another post but it has not worked yet. I am pretty inexperienced in Perl. Any help would be greatly appreciated. I have put a link to the data set as well as the code I have below.

Here is a link to the data set: http://paste-it.net/public/x1d5301/

Here is what I have for code right now.

#!/usr/bin/perl

open FILE, "testset.txt" or die $!;
my @lines = <FILE>;

my $size = scalar @lines;

my @matrix = (1 .. 32);
my $i = 0;
my $j = 0;
my @micro;

foreach ($matrix)
{

foreach ($lines)
{
    push @{ $micro[$matrix]}, $lines;
}
}
+2  A: 

You probably need to chomp the values:

chomp( my @lines = <FILE> );
Alan Haggai Alavi
+5  A: 

It doesn't seem you understand that $matrix only indicates @matrix when it is immediately followed by an array indexer: [ $slot ]. Otherwise, $matrix is a completely different variable from @matrix (and both different from %matrix as well). See perldata.

#!/usr/bin/perl
use English;

Don't! use English--that way!

This brings in $MATCH, $PREMATCH, and $POSTMATCH and incurrs the dreaded $&, $`, $' penalty. You should wait until you're using an English variable and then just import that.

open FILE, "testset.txt" or die $!;

Two things: 1) use lexical file handles, and 2) use the three-arg open.

my @lines = <FILE>;

As long as I'm picking: Don't slurp big files. (Not the case here, but it's a good warning.)

my $size = scalar @lines;

my @matrix = (1 .. 32);
my $i = 0;
my $j = 0;
my @micro;

I see we're at the "PROFIT!!" stage here...

foreach ($matrix) {

You don't have a variable $matrix you have a variable @matrix.

    foreach ($lines) {

Same thing is true with $lines.

        push @{ $micro[$matrix]}, $lines;
    }
}

Rewrite:

use strict;
use warnings;
use English qw<$OS_ERROR>; # $!
open( my $input, '<', 'testset.txt' ) or die $OS_ERROR;

# I'm going to assume space-delimited, since you don't show
my @matrix;
# while ( defined( $_ = <$input> ))...
while ( <$input> ) {
    chomp; # strip off the record separator
    # load each slot of @matrix with a reference to an array filled with
    # the line split by spaces.
    push @matrix, [ split ]; # split = split( ' ', $_ )
}
Axeman
@Axeman FYI: `split` with no arguments is `split ' '` not `split /\s+/`
Sinan Ünür
@Sinan: Not on redhat Perl 5.8.7 or Strawberry Perl 5.12.1:perl -MSmart::Comments$_ = '1 2 3 4 5';my @a = split;### @a### @a: [### '1',### '2',### '3',### '4',### '5'### ]
Axeman
@Sinan, well HTML compressed the spaces, but there are all sorts of spaces and tabs in there. :/
Axeman
@Axeman: can you explain what you mean? split with no arguments is definitely `split ' '`, not `split /\s+/`. The two differ in how leading spaces are treated.
ysth
@Axeman See the CW answer I added to illustrate the difference.
Sinan Ünür
@Sinan, @ysth, ah my reading of perlfunc is a little rusty. I had forgotten "As a special case, specifying a PATTERN of space (' ' ) will split on white space just as split with no arguments does". So I thought that the significance of `split( ' ' )` was it would only split on single-space characters--I only really use the shorthand of these functions in the toy code that I write on SO. You're right, @Sinan it *does* omit the initial zero-length string. My bad. And apparently that is simply to comply with *`awk`*.
Axeman
@Sinan: I guess my *misguided* point was that `split()` behaves more like `split( /\s+/ )` than it does `split( $char )` with `$char = ' '` or `$char = ','`. The `' '` actually *hides* the full range of splitting behavior.
Axeman
@Axeman that's why I thought I should give you a heads up. Anyway, +1.
Sinan Ünür
+1  A: 

I am making this CW because its sole purpose is to clarify a tangential point I made in my comment to @Axeman's answer. See perldoc -f split:

A split on /\s+/ is like a split(' ') except that any leading whitespace produces a null first field. A split with no arguments really does a split(' ', $_) internally.

#!/usr/bin/perl

use YAML;

$_ = "\t1 2\n3\f4\r5\n";

print Dump { 'split'       => [ split       ] },
           { "split ' '"   => [ split ' '   ] },
           { 'split /\s+/' => [ split /\s+/ ] }
           ;

Output:

---
split:
  - 1
  - 2
  - 3
  - 4
  - 5
---
split ' ':
  - 1
  - 2
  - 3
  - 4
  - 5
---
split /\s+/:
  - ''
  - 1
  - 2
  - 3
  - 4
  - 5
Sinan Ünür
+1  A: 

If you are going to be doing quite a bit of math, you might consider PDL (the Perl Data Language). You can easily set up your matrix and before operations on it:

use 5.010;

use PDL;
use PDL::Matrix;

my @rows;
while( <DATA> ) {
    chomp;
    my @row = split /\s+/;
    push @rows, \@row;
    }

my $a = PDL::Matrix->pdl( \@rows );
say "Start ", $a;

$a->index2d( 1, 2 ) .= 999;
say "(1,2) to 999 ", $a;

$a++;
say "Increment all ", $a;

__DATA__
1 2 3
4 5 6
7 8 9
2 3 4

The output shows the matrix evolution:

Start 
[
 [1 2 3]
 [4 5 6]
 [7 8 9]
 [2 3 4]
]

(1,2) to 999 
[
 [  1   2   3]
 [  4   5 999]
 [  7   8   9]
 [  2   3   4]
]

Increment all 
[
 [   2    3    4]
 [   5    6 1000]
 [   8    9   10]
 [   3    4    5]
]

There's quite a bit of power to run arbitrary and complex operations on every member of the matrix just like I added 1 to every member. You completely skip the looping acrobatics.

Not only that, PDL does a lot of special stuff to make math really fast and to have a low memory footprint. Some of the stuff you want to do may already be implemented.

brian d foy