tags:

views:

110

answers:

5

This is a really basic regex question but since I can't seem to figure out why the match is failing in certain circumstances I figured I'd post it to see if anyone else can point out what I'm missing.

I'm trying to pull out the 2 sets of digits from strings of the form:

12309123098_102938120938120938
1321312_103810312032123
123123123_10983094854905490
38293827_1293120938129308

I'm using the following code to process each string:

if($string && $string =~ /^(\d)+_(\d)+$/) {
    if(IsInteger($1) && IsInteger($2)) { print "success ('$1','$2')"; }
    else { print "fail"; }
}

Where the IsInterger() function is as follows:

sub IsInteger {
    my $integer = shift;
    if($integer && $integer =~ /^\d+$/) { return 1; }
    return;
}

This function seems to work most of the time but fails on the following for some reason:

1287123437_1268098784380
1287123437_1267589971660

Any ideas on why these fail while others succeed? Thanks in advance for your help!

A: 

Shouldn't + be included in the grouping:

^(\d+)_(\d+)$ instead of ^(\d)+_(\d)+$

codaddict
+3  A: 

Because you have 0 at the end of the second string, (\d)+ puts only the last match in the $N variable, string "0" is equivalent to false.

ZyX
@ZyX - good explanation of why this was failing. Thanks for helping me understand the issue!
Russell C.
+3  A: 

This is an add-on to the answers from unicornaddict and ZyX: what are you trying to match?

If you're trying to match the sequences left and right of '_', unicorn addict is correct and your regex needs to be ^(\d+)_(\d+)$. Also, you can get rid of the first qualifier and the 'IsIntrger()` function altogether - you already know it's an integer - it matched (\d+)

if ($string =~ /^(\d+)_(\d+)$/) {
    print "success ('$1','$2')";
} else {
    print "fail\n";
}

If you're trying to match the last digit in each and wondering why it's failing, it's the first check in IsInteger() ( if($intger && ). It's redundant anyway (you know it's an integer) and fails on 0 because, as ZyX notes - it evaluates to false.

Same thing applies though:

if ($string =~ /^(\d)+_(\d)+$/) {
    print "success ('$1','$2')";
} else {
    print "fail\n";
}

This will output success ('8','8') given the input 12309123098_102938120938120938

Brian Roach
+3  A: 

When in doubt, check what your regex is actually capturing.

use strict;
use warnings;

my @data = (
    '1321312_103810312032123',
    '123123123_10983094854905490',
);

for my $s (@data){
    print "\$1=$1 \$2=$2\n" if $s =~ /^(\d)+_(\d)+$/;
    # Output:
    # $1=2 $2=3
    # $1=3 $2=0
}

You probably intended the second of these two approaches.

(\d)+  # Repeat a regex group 1+ times,
       # capturing only the last instance.

(\d+)  # Capture 1+ digits.

In addition, both in your main loop and in IsInteger (which seems unnecessary, given the initial regex in the main loop), you are testing for truth rather than something more specific, such as defined or length. Zero, for example, is a valid integer but false.

FM
A: 

Many people have commented on your regex, but the problem you had in your IsInteger (which you really don't need for your example). You checked for "truth" when you really want to check for defined:

sub IsInteger {
    my $integer = shift;
    if( defined $integer && $integer =~ /^\d+$/) { return 1; }
    return;
}

You don't need most of the infrastructure in that subroutine though:

sub IsInteger {
    defined $_[0] && $_[0] =~ /^\d+$/
}
brian d foy
+1 for defined, -1 for the last part: While you're technically correct, actually unpacking the subroutine argument makes for much clearer code and I don't think it's a good idea to discourage useful best practices when Perl is already difficult enough to read for the uninitiated as it is.
Platinum Azure
Unpacking subroutine arguments doesn't always make for cleaner or better code. This is one of the cases where you get no extra value out of it. You are doing more actual work when you don't need to, and creating more to read. If you know Perl, nothing in my version is mysterious. If you don't know Perl, read one of my books :)
brian d foy