views:

250

answers:

4

Hi.

This is the sample test file:

  Barcode:*99899801000689811* 
  JSC4000I accountNumber:10006898Sequence Number:998 Envelopes: 1 
  LCD5010V Using jsl 'CUSOFF' for output page '6'
  Barcode:*99999901000673703* 
  LCD5010V Using jsl 'CUSOFF' for output page '4'
  LCD5005V Using job 'A' for current page '4'

So, in this file, how to search the word Barcode and extract the first five digits of it, simultaneously passing it into an array.

Thanks in advance.

+6  A: 

Try a regular expression, something like this ought to work:

Barcode:\*(\d{5})

Andrew Hare
Barcode:\*(\d{5}) would work too, without the confusion of two groupings
Anonymous
Nice call, I don't know how that extra capture group got in there but I have fixed it.
Andrew Hare
Note that \d may capture more than what you want in recent versions of perl -- unicode digits. It may be safer to do [0-9]{5} instead.
leander
+1  A: 

Regular expressions are one way to go. However, just to throw something completely different at you, here's how to handle that stuff with index and substr:

my @array;
foreach my $line ( <$file> ) {
    if ( index( $line, 'Barcode:' ) == 0 ) {
        push @array, substr $line, 9, 5;
    }
}
innaM
+1: if performance is an issue, then string operations are faster than regular expressions. Not that performance should really be a consideration if you're coding in a dynamic language like Perl.
glenn jackman
Performance can always be an issue. Whether your using Perl or not. And sometimes you do have fixed-length text records; those shouldn't be treated with regular expressions.
innaM
A: 

My solution is similar to Manni's, but I recommend using while to read a file line-by-line. You can use substr() like he does, but a regex with anchors and without quantifiers is going to be pretty fast:

my @barcodes;
while( <$fh> )
    {
    next unless m/^Barcode:\*([0-9]{5})/;

    push @barcodes, $1;
    }

Depending on what else I was doing, I might just use a map instead. The map expression is in list context, so the m// operator returns the list of things it matched in any parentheses:

my @barcodes = map { m/^Barcode:\*([0-9]{5})/ } <$fh>;

I suspect any real-life answer would have a bit more code to warn you about lines that start with Barcode: but are missing the number. I have yet to meet a perfect input file :)

The \G anchor picks up the regex matching where you left off with the last match on the same string, in this case right after the colon:

my @barcodes;
while( <$fh> )
    {
    next unless m/^Barcode:/;

    unless( m/\G\*([0-9]{5])/ )
        {
        warn "Barcode is missing number at line $.\n";
        next;
        }

    push @barcodes, $1;
    }
brian d foy
A: 

A pattern match in array context will return the values marked (by '(' and ')') as a list. Combine this with the looping modifier 'g' to keep re-matching, and you can do it all on one line and I like to think very readable.

my $string =<<'HERE';
Barcode:*99899801000689811* 
JSC4000I accountNumber:10006898Sequence Number:998 Envelopes: 1 
LCD5010V Using jsl 'CUSOFF' for output page '6'
Barcode:*99999901000673703* 
LCD5010V Using jsl 'CUSOFF' for output page '4'
LCD5005V Using job 'A' for current page '4'
HERE

my @array = $string =~ m!Barcode:\*([0-9]{5})[0-9]+\*!g;

# or

foreach my $barcode ($string =~ m!Barcode:\*([0-9]{5})[0-9]+\*!g)
{
    # do stuff with $barcode
}
Beano