tags:

views:

102

answers:

4

For example, /(\w+) (?:\+) (\w)/ that regexp must return 2.

I must apologize for an incomplete question. Here's the problem: Input is XML-file (in fact it does not matter :), which sets rules for strings. At the moment it looks like this:

<string svars="3">(?:total ?|)(\d{1,2}(?:[\.,]\d{1,2}|))\/(\d{1,2}(?:[\.,]\d{1,2}|))\/(\d{1,2}(?:[\.,]\d{1,2}|))\s?sq\.\s?m\.?</string> 

Need to get rid of the variable svars, and count the number of substitution vars is in the regex programmatically.

+1  A: 

This will find all capturing subexpressions in a regex represented as a string.

@matches = "/(\\w+) (?:+) (\\w)/" =~ /(\((?!\?).*?\))/g;
print @match # All matches
print scalar @match # Number of matches (2 in this case)

The regex uses negative lookahead ((?!...)) to make sure that the subexpression does not start with a ? as all non-capturing subexpressions do.

From KennyTM's comment I understand that this wont work if there are escaped parenthesis in the expression. To fix this we use negative lookbehind ((?<!...)). A new regex is born.

 /((?<!\\)\((?!\?).*?\))/g # It looks horrible.

Perl regular expressions reference and tutorial, always handy to have when working with regular expressions!

adamse
` /\\(bang\\)/ ` [ ](http://.)
KennyTM
This won't work. The number of capturing parens in the pattern isn't the same as the number of captures that you will get. The branch reset , `(?|pattern)`, can renumber the captures inside an alternation, for instance.
brian d foy
This shows that regular expressions are not fit to parse regular expressions! I will let my answer remain for those who might be interested...
adamse
Additionally, this doesn't work for patterns using the /x modifier that might use () in a comment.
brian d foy
Let you answer remain for people interested in what? Doing it wrong? How about editing your answer to show why it is wrong so people don't get the wrong idea. Deleting the wrong information is better, though.
brian d foy
`/[(]nicetry[)]/`
KennyTM
A: 

Just in case you are doing this to find out how many captures a given match returns, you can put the regex in list context and it will return all of the captures:

my @captures = $string =~ /(\w+) (?:\+) (\w)/;

You can then loop over them:

for my $capture (@captures) {
    print "$capture\n";
}
Chas. Owens
This reports the number of captures for that string, not the total possible capture groups in the regex. Furthermore, you trigger any side effects from the regex, which might include running code.
brian d foy
@brian d foy I am assuming the reason he or she wants to know the number of captures in a regex would be to cobble together string evals or symbolic references to get the results. Assigning to an array is a much cleaner solution.
Chas. Owens
We don't know what the cleaner solution is until we know what the problem is. :)
brian d foy
+2  A: 

I think you are looking for YAPE::Regex:

#!/usr/bin/perl

use strict; use warnings;
use YAPE::Regex;

my $yape = YAPE::Regex->new( qr/(\w+) (?:\+) (\w)/ );
my $extor = $yape->extract;
my $captures;

$captures++ while $extor->();

print "Number of capture groups: $captures\n";
Sinan Ünür
This will count the literal number of capture groups, but not the number of captures a pattern produces. I noticed it doesn't handle some 5.10 patterns.
brian d foy
+1  A: 

You admitted that you're working with XML. The regex stuff is probably the wrong answer to your problem. You have an XY problem where you're fixated on a solution instead of the problem.


What are you really trying to discover? It's practically impossible to give a good answer to a question such as this if you don't tell us what you are trying to do and why you are trying to do it.

There's a difference between the number of capture groups in the pattern and the number of captures a pattern will produce.

  • The total number of literal capture groups in the regex.

This has one capture although there are literally three capture groups. The branch reset grouping renumbers the captures so that each alternation captures into the same variables:

 (?|(abc)|(def)|(ghi))

Do you want to count that as three capture groups or just one capture it will produce?

Even without the branch reset, how do you want to count this one?

 (abc)|(def)(ghi)|(jkl)

There are four capture groups, but at most only two of them will capture anything.

  • The total number of captures that the regex will produce for a particular string.

Besides the previous examples, some capture groups might never capture anything. The number of captures depends on the string you match, as in these examples:

 (abc)? 
 (abc)*
 (abc){0,5}
  • The maximum number of captures that a regex might produce. That is, for a string that triggers the most number of capture, what is that number?
brian d foy