tags:

views:

209

answers:

2

I have to parse a file so that I can import it to excel. So, I thought the best way was to create a csv file. In this file, I have to divide contents into different categories and represent them in different columns. So, I have parsed the file to create different arrays corresponding to the categories. Now, I am trying to create a csv file with these arrays (thought of using a for loop). But the problem is, that arrays are of unequal length.

INPUT

NM_144736.3
NM_144963.1
XM_144975.2
BC144986.1
NM_144989.1
BC145001.1
XM_145018.2
NM_145015.2
XM_030711.2
AK145024.1
AK145030.1
NM_145034.1

I have used regex to parse data into different arrays. All the NM to @array1, XM to @array2, BC to @array3, AK to @array4. If creating arrays is not a good idea, please let me know what is? How else can I go about generating csv file from the above data.

Edit:

OUTPUT

NM_144963.1,XM_144975.2,BC144986.1,AK145024.1
NM_144963.1,XM_145018.2,BC145001.1,AK145030.1
NM_144989.1,XM_030711.2
NM_145015.2
NM_145034.1
+4  A: 

Using parallel arrays like that is a bad idea. In fact, whenever you find yourself using names such as @array1, @array2 etc, recognize that it is bad idea. And, no, naming the arrays @NM, @XM etc would not have made it better.

The way I see it, you have a single column of data and you have not specify how to split that single column in to multiple columns. ... Nope, my mind reading abilities fell short. Please post desired output and don't leave to our imagination to figure it out.

use strict; use warnings;
use List::AllUtils qw( each_arrayref);

my @fields = qw( NM XM BC AK );
my %data;

while ( <DATA> ) {
    chomp;
    if ( /^([A-Z]{2})_?[0-9]+\.[0-9]$/ ) {
        push @{ $data{$1} }, $_;
    }
}

print join(',', @fields), "\n";

my $it = each_arrayref @data{ @fields };

while ( my @values = $it->() ) {
    print join(',', map{ defined($_) ? $_ : '' } @values ), "\n";
}

__DATA__
NM_144736.3
NM_144963.1
XM_144975.2
BC144986.1
NM_144989.1
BC145001.1
XM_145018.2
NM_145015.2
XM_030711.2
AK145024.1
AK145030.1
NM_145034.1

Output:

NM,XM,BC,AK
NM_144736.3,XM_144975.2,BC144986.1,AK145024.1
NM_144963.1,XM_145018.2,BC145001.1,AK145030.1
NM_144989.1,XM_030711.2,,
NM_145015.2,,,
NM_145034.1,,,
Sinan Ünür
Good idea to use `each_arrayref`. Minor detail, but I think you ought to handle undefined values differently: `grep defined, @values` is fragile if the user decides to order the fields in a different way. Converting undefined cells to empty string would be more flexible.
FM
@FM That is how I was handling them initially, but then I decided to exactly replicate @shubster's sample output rather than anticipating and solving more problems. Otherwise, I agree with you 100%.
Sinan Ünür
@FM went back to converting undefined fields to empty strings.
Sinan Ünür
@Sinan: Can you add comments in your code to explain what it is doing?
shubster
+4  A: 

Parse and write directly to an excel spreadsheet, without importing:

use Spreadsheet::WriteExcel;

my %da;

# Parse the data into a hash of arrayrefs
push @{ $da{ substr $_, 0, 2 } }, $_ foreach <DATA>;

# Create spreadsheet
my $workbook = Spreadsheet::WriteExcel->new( "perl.xls" );
my $worksheet = $workbook->add_worksheet();

# Loop through hashref keys
my @da = sort keys %da; 
for( 0..@da - 1 ) {

   # Create column based on arrayref
   $worksheet->write_col( 0, $_, $da{ $da[$_] } ); 
}

# Close and save spreadsheet
$workbook->close;
Glen