tags:

views:

285

answers:

1

I am interested in the power of PCRE (Perl Compatible Regular Expressions) and wonder whether they are likely to become a de facto approach in all major languages (I am interested in Java). I am prepared to use a library if necessary.

I also could not find a good page in SO describing the pros and cons of PCRE so if this does not exist it could be useful to include this in answers

EDIT I am interested in power beyond Java 1.6 regex, particularly named capture groups

+1  A: 

It seems that more mainstream languages actually use their own implementation of "Perl-like" regexes than actually use libpcre. Languages that fall into this class include (at the very least) Java, JavaScript, and Python.

Java's java.util.regex library uses a syntax that's very heavily based on Perl (approx. version 5.8) regexes, including the rules for escaping, the \p and \P Unicode classes, non-greedy and "possessive" quantifiers, backreferences, \Q..\E quoting, and several of the (?...) constructs including non-capturing groups, zero-width lookahead/behind, and non-backtracking groups. In fact Java regexes seem to have more in common with Perl regexes than libpcre does. :)

The JavaScript language also uses regexes that are derived from Perl; Unicode classes, lookbehind, possessive quantifiers, and non-backtracking groups are absent, but the rest of what I mentioned for Java is present as well in JS.

Python's regex syntax is also based on Perl 5's, with non-greedy quantifiers, most of the (?...) constructs including non-capturing groups, look-ahead/behind and conditional patterns, as well as named capture groups (but with a different syntax than either Perl or PCRE). Non-backtracking groups and 'possessive' quantifiers are (as far as I can see) absent, as are \p and \P Unicode character classes, although the standard \d, \s, and \w classes are Unicode-aware if requested.

hobbs
Thank you. I have clarified my question to show that I am interested in features that Java 1.6 does not support
peter.murray.rust
Perl, Python, .NET, libpcre. Those are the only implementations I know of that support named capture groups.
hobbs