views:

86

answers:

2

I have a situation where something can appear in a format as follows:

---id-H--
Header: data
Another Header: more data
Message: sdasdasdasd
Message: asdasdasdasd
Message: asdasdasd

There may be many messages, or just a couple. I'd prefer not having to step outside of RegEx, because I am using the RegEx to parse some header information above the messages and the messages along with the headers are part of the text I am parsing. The messages attached into the text might be many.

I would also like to use named capture groups, so something like

Message: (?<Message[index of match]>.+)

where it matches the match as many times as it can with the index filled in. Does anything like this exist in RegEx? (I will eventually be using this in Perl.)

+3  A: 

Assuming each group is separated by an empty line, this might get you closer:

use strict;
use warnings;

# use two lines as the "line" separator
local $/ = "\n\n";

while (my $line = <DATA>)
{
    my ($id) = ($line =~ /^---id-(\d+)--$/m);
    my @messages = ($line =~ /^Message: (.*)$/mg);

    print "On line $id, found these messages: ", join(', ', @messages), "\n";
}
__DATA__
---id-1--
Header: data
Another Header: more data
Message: sdasdasdasd
Message: asdasdasdasd
Message: asdasdasd

---id-2--
Header: data2
Another Header: stuff
Message: more message
Message: another message
Message: YAM

Running that gives:

On line 1, found these messages: sdasdasdasd, asdasdasdasd, asdasdasd  
On line 2, found these messages: more message, another message, YAM  
Ether
+2  A: 

The Perl named capture buffer syntax where you have (?<name>...) is really a replacement or alternative use of the Perl syntax of /(pattern1(pattern2))/ with the potential ambiguity of which capture buffer is which.

You could potentially get an hashed form of the match (?<name>pattern) then referring to the special hash values of %+ and %- See perlre for the named capture buffer syntax and perlvar for an example of the %+ and %- and named captures.

There are much simpler solutions in Perl however. You can do a global match which returns a list, then operate on the list. You match all into an array.

Here are samples:

foreach my $message ($text=~/^Message: (.*)/gm) {
   # Process messages...
}

or

my @messages = ($text=~/^Message: (.*)/gm);
print "The first message is $messages[0]\n";

There are many more ways, but those 2 are common and Perly

Best of luck.

drewk