views:

306

answers:

3

I've got a regular expression (javascript) which is something like...

/(x)(y)+(z)/gi

The problem is that I'll always get exactly 3 captures from those parens, even if the (y)+ matched multiple times. If it does match multiple times, it just returns the last match. I've no way of knowing ahead of time how many times y will match on any given run, but I want to capture all of them.

Any ideas?

+3  A: 

I take it you can't use

/(x)((?:y)+)(z)/gi

because this is part of a "larger regex"?

harpo
Wouldn't it be (?:y) to prevent the group from being named/numbered?
strager
It would in .NET... I don't believe that Javascript supports non-capturing groups.
harpo
I stand corrected: http://www.kryogenix.org/days/2008/06/17/non-capturing-groups-in-a-regexp
harpo
+2  A: 

Move the + inside of the parentheses and then split y into its individual parts. The following is Perl, but it should give you an idea:

#!/usr/bin/perl

use strict;
use warnings;

my $s = "abcbcbcd";

my ($x, $y, $z) = $s =~ /(a)((?:bc)+)(d)/;

my @y = $y =~ /(bc)/g;

print "x is $x\n",
    "y is ", join(", ", @y), "\n",
    "z is $z\n";

And here is some crappy Javascript I hacked together (I don't really know Javascript):

<html>
<body>

<script type="text/javascript">
var str = "abcbcbcd"; 
var matches = str.match(/(a)((?:bc)+)(d)/);
var x = matches[1];
var y = matches[2].match(/(bc)/g);
var z = matches[3];

document.write(
    "x is ", x, "<br />",
    "y is ", y.join(", "), "<br />",
    "z is ", z
);
</script>

</body>
</html>
Chas. Owens
+4  A: 

I would use

/(x)(y+)(z)/gi

then take the text that matched the second group and parse it further.

David Zaslavsky
This works only if y is indeed a single character... I took these letters to be surrogates for longer patterns.
harpo
Right, but this gets the point across. I figured the OP would be able to insert non-grouping parentheses as needed.
David Zaslavsky
Thanks. Kind of a 'duh' moment when you put it that way. This regex is just breaking up a huge parsing problem into more manageable chunks to be further regexed anyways, so this fits nicely. I'm rather surprised there's no native way to do this, though. Guess it would throw off cardinality.
Asmor