tags:

views:

49

answers:

1

I've got a string that contains multiple substrings, each of which contains one or more 'E' character. I am trying to get the coordinates of each of these sustrings using Perl and regex. Here is what I tried at first.

#!/usr/bin/perl
use strict;

my $str = "GGGFFEEIIEIIIIEEEIIIETTGGG";
foreach my $match($str =~ m/(E+)/)
{
  print "match: $match, coords: (". $-[0] .", ". $+[0] .")\n";
}

The terminal output looks like this...

> ./test
match: EE, coords: (5, 7)

so it is successfully finding the first substring. But I would like to identify each substring. So I added the 'g' modifier to the regex like so...

#!/usr/bin/perl
use strict;

my $str = "GGGFFEEIIEIIIIEEEIIIETTGGG";
foreach my $match($str =~ m/(E+)/g)
{
  print "match: $match, coords: (". $-[0] .", ". $+[0] .")\n";
}

which gives the following terminal output.

> ./test
match: EE, coords: (20, 21)
match: E, coords: (20, 21)
match: EEE, coords: (20, 21)
match: E, coords: (20, 21)

As you can see, it finds each substring correctly, but I am only pulling out the coordinates of the last match. Maybe I'm using $- and $+ incorrectly? Any ideas how I can grab these coordinates correctly? Thanks.

+5  A: 

foreach builds the list of matches first, and then iterates over them. At that point, @- and @+ contain only the data from the last match. Try:

#!/usr/bin/perl
use strict;

my $str = "GGGFFEEIIEIIIIEEEIIIETTGGG";
while ($str =~ m/(E+)/g)
{
  printf "match: %s, coords: (%d, %d)\n", $1, $-[0], $+[0];
}
cjm
Thanks, this was it. I actually originally tried the while instead of foreach, but without the g in the regex, it was an infinite loop. Thanks for taking me the rest of the way!
Daniel Standage
@Daniel Standage, yes, without the `/g` it always finds the first match, so the loop repeats as long as `$str` contains an E. If you don't modify `$str` inside the loop ... infinite loop.
cjm