tags:

views:

112

answers:

5

I'm a total newbie to Perl, but I've heard that it's great for parsing files, so I've thought of giving it a spin.

I have a text file that has the following sample info:

High school is used in some
parts of the world, particularly in
Scotland, North America and Oceania to
describe an institution that provides
all or part of secondary education.
The term "high school" originated in
Scotland with the world's oldest being
the Royal High School (Edinburgh) in
1505.

The Royal High School was used as a
model for the first public high school
in the United States, the English High
School founded in Boston,
Massachusetts, in 1821. The precise
stage of schooling provided by a high
school differs from country to
country, and may vary within the same
jurisdiction. In all of New Zealand
and Malaysia along with parts of
Australia and Canada, high school is
synonymous with secondary school, and
encompasses the entire secondary stage
of education.

======================================
Grade1 87.43%
Grade2 84.30%
Grade3 83.00%
=====================================

I want to parse the file and only get the numerical information. I looked into regex, and I think I'd use something like

if (m/^%/) {
    do something
}
else {
    skip the line
}

But, what I really want to do is keep track of the variable on the left and store the numerical value in that variable. So, after parsing the file, I'd really like to have the following variables to have the % value stored in them. The reason being, I want to create a pie-chart/bar graph of the different grades.

Grade1 = 87.43 Grade2 = 84.30

...

Could you'll suggest methods I should be looking at?

+5  A: 

You'll need a regular expression. Something like the following should work

while (<>) {
  /(Grade[0-9]+)\s*([0-9]+\.[0-9]+)/;
  $op{$1} = $2;
}

as a filter. The op hash will store the grade names and scores. This is preferable to automatically instantiating variables.

Noufal Ibrahim
I had a typo in my regexp. I've fixed it now.
Noufal Ibrahim
A: 

Creating dynamic variable names is probably not going to help you much in producing a graph; using an array is almost certainly a better idea.

However, if you really think you want to do this:

while (my $line = <$your_infile_handler>){
   if ($line =~ m/(.*) = ([0-9.]*)){
      $$1 = $2;
   }
}

should accomplish this.

Wooble
Hi wooble, you are right. I've been looking at some scripts that produce graphs from data like GD : (http://www.ibm.com/developerworks/library/os-perlgdchart/), and they do mention creating an array such as Data [] []. But I'm not entirely sure how to populate that array by parsing the file. I'm going to give this some tries and post back with difficulties I'm having ...
c0d3rs
+2  A: 

You want to use a hash. Something like this should do the trick:

my %grades = (); # this is a hash
open(my $fh, "grade_file.txt" ) or die $!;
while( my $line = <$fh> ) {
     if( my( $name, $grade ) = $line =~ /^(Grade\d+)\s(\d+\.\d+\%) ) {
         $grades{$name} = $grade;
     }
}
close($fh);

Your %grades hash would then contain the name and grade pairs. (Access it like my $value = $grades{'Grade1'}

Also just a note. The language is called "Perl", not "PERL". Many people in the Perl community get upset about it :-)

Cfreak
Hi All - Thanks for the replies. However, I must admit a mistake I made, I mentioned that it says Grade1 - 80 %Grade2 - 80 %etc..The problem is your solution makes use of 'Grade' as a selection criteria in the regex expression. However, that is only one file. Most of my other files, have individual names in them, as in:Mike 80%Shawn 60%Jason 44%So it makes i
c0d3rs
Also, thanks for letting me know about Perl! I'll not make the mistake a second time ;).
c0d3rs
+3  A: 

If you can guarantee that your points of interest are nested between two =s (and there isn't an odd number of these demarcations in a given file), the flip-flop operator is a handy thing here:

use strict;    # These two pragmas go a long, ...
use warnings;  # ... long way in helping you code better

my %scores;    # Create a hash of scores

while (<>) {   # The diamond operator processes all files ...
               # ... supplied at command-line, line-by-line

    next unless /^=+$/ .. /^=+$/;  # The flip-flop operator used ...
                                   # ... to filter out only 'grades'

    my ( $name, $grade ) = split;  # This usage of split will break ...
                                   # ... the current line into an array    

    $scores{$name} = $grade;       # Associate grade with name
}
Zaid
+1 for mentioning the flip flop operator. Interesting.
Noufal Ibrahim
A: 

See Zaid's answer for an example of using the flip-flop operator (which is what I would recommend). However, if you run into difficulties with that (sometimes the DWIMmery might get in the way), you can also explicitly maintain state while reading the file line-by-line:

#!/usr/bin/perl

use strict; use warnings;

my %grades;
my $interesting;

while ( my $line = <DATA> ) {
    if ( not $interesting and $line =~ /^=+\s*\z/ ) {
        $interesting = 1;
        next;
    }
    if ( $interesting ) {
        if ( $line =~ /^=+\s*$/ ) {
            $interesting = 0;
            next;
        }
        elsif ( my ($name, $grade) = $line =~ /^(\w+)\s+(\d+\.\d+%)/ ) {
            # Keep an array in case the same name occurs
            # multiple times
            push @{ $grades{$name} }, $grade;
        }
    }
}

use YAML;
print Dump \%grades;
Sinan Ünür