views:

136

answers:

5

I am trying to grab any digits in front of a known line number of a phone, if they exist (in Perl). There will be no dashes, only digits.

For example, say I know the line number will always be 8675309. 8675309 may or may not have leading digits, if it does I want to capture them. There is not really a limit on the number of leading digits.

$input          $digits       $number
'8675309'       ''            '8675309'
'8008675309'    '800'         '8675309'
'18888675309'   '1888'        '8675309'
'18675309'       '1'           '8675309'
'86753091'      not a match

/8675309$/ this will match how to capture the pre-digits in one regex?

+2  A: 
my($digits,$number);
if ($input =~ /^(\d*)(8675309)$/) {
  ($digits,$number) = ($1,$2);
}

The * quantifier is greedy, but that means it matches as much as possible while still allowing a match. So initially, yes, \d* tries to gobble up all the digits in $number, but it reluctantly gives up character-by-character what it's matched until the whole pattern matches successfully.

Another approach is to chop off the tail:

(my $digits = $input) =~ s/8675309$//;

You could do the same without using a regular expression:

my $digits = $input;
substr($digits, -7) = "";

The above, at least with perl-5.10-1, could even be condensed to

substr(my $digits = $input, -7) = "";
Greg Bacon
my confusion is I thought the (\d*) would have greedily captured the whole string, but it does not seem to. I thought you had to make regexes non-greedy with an option?
@unk, the regex engine will backtrack and try to satisfy the \d* condition however possible. It starts by grabbing as much as it can, then backs off as needed to try to satisfy each subsequent requirement. Take a look at the output from `perl -Mre=debug -e '$foo="18008675309"; $foo =~ /(\d*)8675309/;'`
daotoad
A: 

How about /(\d)?(8675309)/? UPDATE:

whoops that should haev been /(\d*)(8675309)/

FrustratedWithFormsDesigner
Without `^` and `$` anchors, that pattern could match anywhere in the target string.
Greg Bacon
+8  A: 

Some regexes work better backwards than forwards. So sometimes it is useful to use sexeger, rather than regexes.

my $pn = '18008675309';

reverse($pn) =~ /^9035768(\d*)/;
my $got = reverse $1;

The regex is cleaner and avoids a lot of back tracking at the cost of some fummery with reversing the input and captured values.

The backtracking gain is smaller in this case than it would be if you had a general phone number extraction regex:

Regex:   /^(\d*)\d{7}$/
Sexeger: /^\d{7}(\d*)/

There is a whole class of problems where this technique is useful. For more info see the sexeger post on Perlmonks.

daotoad
+1 for "sexeger"
Ragepotato
@Ragepotato, I wish I invented the term. But it is memorable.
daotoad
+1  A: 

The regex special variables $` and $& are another way of grabbing those pieces of information. They hold the contents of the data preceding the match and the match itself respectively.

   if ( /8675309$/ )
      {
      printf( "%s,%s,%s\n", $_, $`, $& );
      }
   else
      {
      printf( "%s,Not a match\n", $_ );
      }
Mark Wilkins
A: 

I might not understand the problem. Why is there a difference between the first and fourth examples:

'8675309'    ''   '8675309'  
...  
'8675309'    '1'  '8675309'

If all you want is to separate the last seven digits from everything else, you could have said it that way rather than provide confusing examples. A regex for that would be:

/(\d*)(\d{7,7})$/

If you weren't just providing a hypothetical number, and really are only looking for lines with '8675309' (seems strange), replace the '\d{7,7}' with '8675309'.

gary
updated - 4th example should have been input ='18675309'