ansaurus

Question

How can I extract a bunch of numbers from a string?

Answer 1

+6 A:

Try a regular expression, something like this ought to work:

Barcode:\*(\d{5})

Andrew Hare 2009-05-07 11:58:48

Barcode:\*(\d{5}) would work too, without the confusion of two groupings

Anonymous 2009-05-07 12:04:57

Nice call, I don't know how that extra capture group got in there but I have fixed it.

Andrew Hare 2009-05-07 12:08:37

Note that \d may capture more than what you want in recent versions of perl -- unicode digits. It may be safer to do [0-9]{5} instead.

leander 2009-05-07 14:36:35

Answer 2

+1 A:

Regular expressions are one way to go. However, just to throw something completely different at you, here's how to handle that stuff with index and substr:

my @array;
foreach my $line ( <$file> ) {
    if ( index( $line, 'Barcode:' ) == 0 ) {
        push @array, substr $line, 9, 5;
    }
}

innaM 2009-05-07 12:17:20

+1: if performance is an issue, then string operations are faster than regular expressions. Not that performance should really be a consideration if you're coding in a dynamic language like Perl.

glenn jackman 2009-05-07 15:25:46

Performance can always be an issue. Whether your using Perl or not. And sometimes you do have fixed-length text records; those shouldn't be treated with regular expressions.

innaM 2009-05-07 16:40:33

Answer 3

A:

My solution is similar to Manni's, but I recommend using while to read a file line-by-line. You can use substr() like he does, but a regex with anchors and without quantifiers is going to be pretty fast:

my @barcodes;
while( <$fh> )
    {
    next unless m/^Barcode:\*([0-9]{5})/;

    push @barcodes, $1;
    }

Depending on what else I was doing, I might just use a map instead. The map expression is in list context, so the m// operator returns the list of things it matched in any parentheses:

my @barcodes = map { m/^Barcode:\*([0-9]{5})/ } <$fh>;

I suspect any real-life answer would have a bit more code to warn you about lines that start with Barcode: but are missing the number. I have yet to meet a perfect input file :)

The \G anchor picks up the regex matching where you left off with the last match on the same string, in this case right after the colon:

my @barcodes;
while( <$fh> )
    {
    next unless m/^Barcode:/;

    unless( m/\G\*([0-9]{5])/ )
        {
        warn "Barcode is missing number at line $.\n";
        next;
        }

    push @barcodes, $1;
    }

brian d foy 2009-05-09 20:15:01

Answer 4

A:

A pattern match in array context will return the values marked (by '(' and ')') as a list. Combine this with the looping modifier 'g' to keep re-matching, and you can do it all on one line and I like to think very readable.

my $string =<<'HERE';
Barcode:*99899801000689811* 
JSC4000I accountNumber:10006898Sequence Number:998 Envelopes: 1 
LCD5010V Using jsl 'CUSOFF' for output page '6'
Barcode:*99999901000673703* 
LCD5010V Using jsl 'CUSOFF' for output page '4'
LCD5005V Using job 'A' for current page '4'
HERE

my @array = $string =~ m!Barcode:\*([0-9]{5})[0-9]+\*!g;

# or

foreach my $barcode ($string =~ m!Barcode:\*([0-9]{5})[0-9]+\*!g)
{
    # do stuff with $barcode
}

Beano 2009-05-09 20:18:01

ansaurus

tags:

views:

answers:

How can I extract a bunch of numbers from a string?

related questions