ansaurus

Question

Answer 1

A:

You are trying to parse a complex expression with a regex - which is an insufficient tool for the job. Recall that regular expressions cannot parse higher grammars. For intuition, any expression which might be nested cannot be parsed with regex.

Yuval A 2010-06-17 22:36:57

perl's regexen are irregular. you can use `(??{blah})`, though it's not exactly recommended practice.

sreservoir 2010-06-17 22:38:39

perl's regex engine also supports recursion, which allows it to match nested constructs easily

Eric Strom 2010-06-17 22:44:30

True - many regex implementations can actually parse more than the set of regular languages, but this is not consistent. If you need to parse a grammar - use a proper grammar parser.

Yuval A 2010-06-17 22:45:56

Answer 2

+6 A:

In list context, a regular expression will return a list of all the parenthesized matches.

So all you have to do is:

my @matches = $string =~ /regex (with) (parens)/;

And assuming that it matched, @matches will be an array of the two capturing groups.

So using your regex:

my @subs = $data =~ /^"([0-9]+)",([^:]*):(\W+):([A-Z]{3}[0-9]{5}),ID=([0-9]+)/;

Also, when you have long regexes, Perl has the x modifier, which goes after the closing regex delimiter. The x modifier allows you to put white-space and newlines inside the regex for increased readability.

If you are worried about the capturing groups that might be zero length, you can pass the matches through @subs = grep {length} @subs to filter them out.

Eric Strom 2010-06-17 22:37:25

If you don't know whether the regex has parens or not, and want to return nothing if it does not (instead of the default entire matched string), add an extra set: `$string =~ /(regex)/` and discard it from the results.

ysth 2010-06-18 01:30:54

That grep will filter out parens not actually used in the match, but not zero-length ones (which will be defined and "")

ysth 2010-06-18 01:32:19

@ysth => you're right, fixed.

Eric Strom 2010-06-18 13:17:45

Thank you! I've been doing Perl for years, how did I never know that you can return matches in list context? Might have to go back and re-read my books.

coding_hero 2010-06-21 00:47:37

Answer 3

A:

When you want to find text inside of pairs of parenthesis, you want to use Text::Balanced.

But, that is not what you want to do, so it will not help you.

Kevin Panko 2010-06-17 22:58:57

despite the name of the question, it doesn't seem like the OP is actually looking to match nested parens, just to use a regex that could have any number of sequential capturing groups

Eric Strom 2010-06-17 23:00:30

Sorry, I should have said 'parenthetical groupings' instead of 'parentheses'.

coding_hero 2010-06-21 16:59:27

Answer 4

+1 A:

Then, I could call it like this:

@subs = parse($data, 
          '^"([0-9]+)",([^:]*):(\W+):([A-Z]{3}[0-9]{5}),ID=([0-9]+)');

Instead, call it like:

parse($data, 
    qr/^"([0-9]+)",([^:]*):(\W+):([A-Z]{3}[0-9]{5}),ID=([0-9]+)/);

Further, your task would be made simpler if you can use named captures (i.e. Perl 5.10 and later). Here is an example:

#!/usr/bin/perl

use strict; use warnings;

my %re = (
    id => '(?<id> [0-9]+ )',
    name => '(?<name> \w+ )',
    value => '(?<value> [0-9]+ )',
);

my @this = (
    '123,one:12',
    '456,two:21',
);

my @that = (
    'one:[12],123',
    'two:[21],456',
);

my $this_re = qr/$re{id}   ,   $re{name}    : $re{value}/x;
my $that_re = qr/$re{name} : \[$re{value}\] , $re{id}   /x;

use YAML;

for my $d ( @this ) {
    print Dump [ parse($d, $this_re) ];
}

for my $d ( @that ) {
    print Dump [ parse($d, $that_re) ];
}

sub parse {
    my ($d, $re) = @_;
    return unless $d =~ $re;
    return my @result = @+{qw(id name value)};
}

Output:

---
- 123
- one
- 12
---
- 456
- two
- 21
---
- 123
- one
- 12
---
- 456
- two
- 21

Sinan Ünür 2010-06-18 14:58:22

Thank you for this, it is good to know!

coding_hero 2010-06-21 21:55:27

ansaurus

tags:

views:

answers:

Matching n parentheses in perl regex

related questions