tags:

views:

104

answers:

6
open(MR, "<path_to_report");
$rid;

The file might be very big in size. It has a unique line inside it in the following format

Root identifier: <some number>

for example

Root identifier: 12987

I need to extract 12987 to $rid.

How can I do that in Perl?

A: 

Read one line at a time using the <> operator, and use Perl regular expressions to find and extract what you need. If you are in a unix-like environment check man perlre for Perl regular expressions reference.

Dima
Is there no easier way than reading line by line using `<>`? The file might be huge.
Lazer
How can it get any easier than <>?
reinierpost
The alternative is to read the whole file up front into a giant string, as suggested by Nikhil. Reading one line at a time is better, IMHO, because you only use enough memory for one line, and you stop reading when you have found what you are looking for. In the average case, you would only need to read half the file, rather than the whole file.
Dima
Easier? From a coding standpoint, no. If you mean more efficient, you could do the same things in perl that make the grep program fast - see http://xrl.us/bhwyga (perl regexes will do the Boyer-Moore part for you)
ysth
@ysth: You are correct. Reading one line at a time is a couple more lines of code. But Lazer worried about the file being huge. In that case it makes sense to avoid reading the whole file into memory.
Dima
+1  A: 
while(<MR>) {
    chomp;
    ($rid) = $_ =~ m/Root identifier: (\d+)/;
    last if defined $rid;
}
Pedro Silva
`last if defined $rid;`
Greg Bacon
Fair enough, `$rid` could be 0.
Pedro Silva
combinable as `last if ($rid) = /Root Identifier: (\d+)/;`; also, the chomp there is useless (but could become useful if more is being done with the input)
ysth
Why the two downvotes?
Pedro Silva
A: 
    #!/usr/bin/perl
    use strict;
    use warning;
    open(IN, '<', file) or die $!;
    read(IN, my $data, -s file);
    $rid =~ $data = m/Root identifier: (\d+)/;
    print"$rid";
    close(IN);
Nikhil Jain
Better to do 'open my $in, "<", $path or die "$path: $!";' and even better to just read from ARGV using <>.
William Pursell
@William Pursell: That's true, i was giving just idea how to do it.Thanks
Nikhil Jain
The question notes the file may be "very big in size" (though sometimes people say that when in fact there is no concern of using too much memory) so this isn't likely to be the right approach. Even if it were, File::Slurp::read_file is the way to go.
ysth
@ysth: Agreed, but If reading a whole file at once is not a good approach then reading a file line one by one would be good approach?Take worst case if reading file line by line and result is in the last line then?
Nikhil Jain
why the two downvotes?
Nikhil Jain
@William: better not to try to interpolate `<` with double-quotes. Better, don't use quotes with single-characters, use `q/qq`; Better yet, don't nitpick: we all read PBP, ok?
Pedro Silva
@Pedro Silva: that made no sense to me. what do you mean "interpolate" `<` ? (Don't burn Korans, burn PBP.)
ysth
@ysth: "try to interpolate", not "interpolate" -- the double quotes and all that...
Pedro Silva
@pedro Being nitpicked in forums like this is a great way to develop good practices; no insult is intended. Normally, I would use single quotes when giving '<' as a mode, but I changed it to double quotes since my overall example was single-quoted. Is there anything wrong with writing "<" in code? Using q(<) instead of "<" seems really odd. Is this merely a performance issue of avoiding interpolation? Or do you have something else in mind?
William Pursell
A: 

The following will find the number and leave it in $rid

open(MR, "<path_to_report");
while(<MR>) {
   chomp;
   next unless /Root identifier:\s*[0-9]+\s*$/;
   tr/\D//g;
   $rid = $_;
}

You didn't specify exact amount or type of white space between the ':' and the number or after the number so I'm including \s* before and after the digits to allow for a variable amount.

HerbN
A: 

I'd use something like this:

#!/usr/bin/perl

use 5.010;

open(MR, "<", file) or die "cannot open file";
while(<MR>) {
    last if(/^Root identifier: (\d+)$/ig);
}

say($1);

P.S.: You could also use:

last if(/^Root identifier: (\d+)$/ig) while(<MR>);
polemon
-1: for failing to include $! in the message to die
William Pursell
@William: gotta chill out man; your fingers are twitching.
Pedro Silva
Don't access $1 unless you know a regex succeeded, otherwise it may be a left over value from a previous match.
ysth
+2  A: 

Here is another way to to do it using more modern idioms:

use warnings;
use strict;

open my $file, '<', 'path_to_report'   # 3 arg open is safer
     or die "could not open file: $!"; # checking for errors is good

my $rid;
while (<$file>) {
    last if defined( ($rid) = /Root identifier: (\d+)/ );
}

close $file;

if (defined $rid) {
    # do something with $rid
}
Eric Strom