tags:

views:

284

answers:

1

I have some code that grabs the "between" of some text; specifically, between a foo $someword and the next foo $someword.

However, what happens is it gets stuck at the first "between" and somehow the internal string position doesn't get incremented.

The input data is a text file with newlines here and there: they are rather irrelevant, but make printing easier.

my $component = qr'foo (\w+?)\s*?{';

while($text =~ /$component/sg)
{
    push @baz, $1; #grab the $someword
}

my $list = join( "|", @baz);
my $re = qr/$list/; #create a list of $somewords

#Try to grab everything between the foo $somewords; 
# or if there's no $foo someword, grab what's left.

while($text=~/($re)(.+?)foo ($re|\z|\Z)/ms)   
#if I take out s, it doesn't repeat, but nothing gets grabbed.
{
#   print pos($text), "\n";   #this is undef...that's a clue I'm certain.
    print $1, ":", $2; #prints the someword and what was grabbed.
    print "\n", '-' x 20, "\n";
}
+4  A: 

Update: One more update to deal with 'foo' occurring inside the text you want to extract:

use strict;
use warnings;

use File::Slurp;

my $text = read_file \*DATA;

my $marker = 'foo';
my $marker_re = qr/$marker\s+\w+\s*?{/;

while ( $text =~ /$marker_re(.+?)($marker_re|\Z)/gs ) {
    print "---\n$1\n";
    pos $text -= length $2;
}

__DATA__
foo one {
one1
one2
one3

foo two
{ two1 two2
two3 two4 }

that was the second one

foo three { 3
foo 3 foo 3
foo 3
foo foo

foo four{}

Output:

---

one1
one2
one3


---
 two1 two2
two3 two4 }

that was the second one


---
 3
foo 3 foo 3
foo 3
foo foo


---
}
Sinan Ünür
About, yes. I'm looking for everything after the { and before the next foo.
Paul Nathan
That works.Without the pos $text -= 3, it returns the first and the last.I'm afraid I'm pretty confused about *why* your solution worked and what was wrong with mine. Thoughts?
Paul Nathan
Looking for the `(?:foo|\Z)` advances `pos $text` by the length of `foo` if there is a foo. Therefore, the next match starts after the next `foo` unless `pos $text` is reset to a position before the next `foo` which is three characters before the current position. If you have already hit the end of the string, this does not matter.
Sinan Ünür
@Sinan: I notice that if there is a my $foomatic, the regex matches the 'foo'. I modified my regex to have \bfoo\b. :-) thanks a lot for the help.
Paul Nathan