views:

114

answers:

2

Dear Perl gurus:

Question abstract:

how to parse text file into two "hashes" in Perl. One store key-value pairs taken from the (X=Y) part, another from the (X:Y) part?

1=9  
2=2  
3=1  
4=6  
2:1  
3:1  
4:1  
1:2
1:3  
1:4  
3:4  
3:2

they are kept in one file, and only the symbol between the two digits denotes the difference.

===============================================================================

I just spent around 30 hours learning Perl during last semester and managed to finish my Perl assignment in an "head first, ad-hoc, ugly" way.

Just received my result for this section as 7/10, to be frank, I am not happy with this, particularly because it recalls my poor memory of trying to use Regular Expression to deal with formatted data, which rule is like this :

1= (the last digit in your student ID,or one if this digit is zero)  
2= (the second last digit in your student ID,or one if this digit is zero)
3= (the third last digit in your student ID, or one if this digit is zero)
4= (the forth last digit in your student ID, or one if this digit is zero)

2:1 
3:1  
4:1  
1:2  
1:3  
1:4  
2:3 (if the last digit in your student ID is between 0 and 4) OR
    3:4 (if the last digit in your student ID is between 5 and 9)
3:2 (if the second last digit in your student ID is between 0 and 4) OR
    4:3 (if the second last digit in your student ID is between 5 and 9)

An example of the above configuration file: if your student ID is 10926029, it has to be:

1=9  
2=2  
3=1  
4=6  
2:1  
3:1  
4:1  
1:2
1:3  
1:4  
3:4  
3:2

The assignment was about Pagerank calculation, which algorithm is simplified so I came up with the answer to that part in 5 minutes. However, it was the text parsing part that took me heaps of time.

The first part of the text (Page=Pagerank) denotes the pages and their corresponding pageranks.

The second part (FromNode:ToNode) denotes the direction of a link between two pages.

For a better understanding, please go to my website and check the requirement file and my Perl script here

There are massive comments in the script so I reckon it is not hard at all to see how stupid I was in my solution :(

If you are still on this page, let me justify why I ask this question here in SO:

I got nothing else but "Result 7/10" with no comment from uni.

I am not studying for uni, I am learning for myself.

So, I hope the Perl gurus can at least guide me the right direction toward solving this problem. My stupid solution was sort of "generic" and probable would work in Java, C#, etc. I am sure that is not even close to the nature of Perl.

And, if possible, please let me know the level of solution, like I need to go through "Learning Perl ==> Programming Perl ==> Master Perl" to get there :)

Thanks for any hint and suggestion in advance.

Edit 1:

I have another question posted but closed here, which describes pretty much like how things go in my uni :(

+3  A: 

Is this what you mean? The regex basically has three capture groups (denoted by the ()s). It should capture one digit, followed by either = or : (that's the capture group wrapping the character class [], which matches any character within it), followed by another single digit.

my ( %assign, %colon );

while (<DATA>) {
    chomp;                     
    my ($l, $c, $r) = $_ =~ m/(\d)([=:])(\d)/;

    if    ( q{=} eq $c ) { $assign{$l} = $r; }
    elsif ( q{:} eq $c ) { $colon{$l}  = $r; }
}        

__DATA__
1=9  
2=2  
3=1  
4=6  
2:1  
3:1  
4:1  
1:2
1:3  
1:4  
3:4  
3:2

As for the recommendation, grab a copy of Mastering Regular Expressions if you can. It's very...thorough.

Pedro Silva
@Pedro Silva: That's absolutely better than what I did. I was trying to do parsing and data validation in the same process and ended up with a messy script. Also, hanks for the recommended book.
Michael Mao
I will recommend the book too, I have used it at work and it has been so much help, I am always recommending learning regex, it's so helpful in data parsing, I would add a response but it looks covered.
onaclov2000
Your code drops the links from 1 to 2 and 3, retaining only the link from 1 to 4. A simple hash can't associate multiple values with a single key.
daotoad
just implementing the poster's specifications...And of course a hash *can* associate multiple values with a single key:`push @{$hash{single_key}} = 'one_of_many_values';`.Actually, I just noticed you did exactly this in your answer.
Pedro Silva
+1  A: 

Well, if you don't want to validate any restrictions on the data file, you can parse this data pretty easily. The main issue lies in selecting the appropriate structure to store your data.

use strict;
use warnings;

use IO::File;

my $file_path = shift;  # Take file from command line

my %page_rank;
my %links;

my $fh = IO::File->new( $file_path, '<' )
    or die "Error opening $file_path - $!\n";

while ( my $line = $fh->readline ) {
    chomp $line;

    next unless $line =~ /^(\d+)([=:])(\d+)$/; # skip invalid lines

    my $page      = $1;
    my $delimiter = $2; 
    my $value     = $3;


    if( $delimiter eq '=' ) {

        $page_rank{$page} = $value;
    }
    elsif( $delimiter eq ':' ) {

        $links{$page} = [] unless exists $links{$page};

        push @{ $links{$page} }, $value;
    }

}

use Data::Dumper;
print Dumper \%page_rank;
print Dumper \%links;

The main way that this code differs from Pedro Silva's is that mine is more verbose and it also handles multiple links from one page properly. For example, my code preserves all values for links from page 1. Pedro's code discards all but the last.

daotoad
@daotoad : I like the use of modules. It was painful not to be allowed to use any of them in my original code...
Michael Mao
IO::File is a core module and I used it to improve readability over the `<>` operator. There's no extra functionality that couldn't have been achieved with `open` and `<>`, just slightly cleaner (IMO) syntax.
daotoad