tags:

views:

1439

answers:

6

Both languages claim to use Perl style regular expressions. If I have one language test a regular expression for validity, will it work in the other? Where do the regular expression syntaxes differ?

The use case here is a C# (.NET) UI talking to an eventual Java back end implementation that will use the regex to match data.

Note that I only need to worry about matching, not about extracting portions of the matched data.

+4  A: 

c# regex has its own convention for named groups (?<name>). I don't know of any other differences.

Rex M
are named groups used for matching? or for extracting the matched portions after the match?
TREE
A: 

I find RegexBuddy invaluable when dealing with regex on multiple systems. Not an answer specifically, but you can convert between flavors easily and see the differences yourself.

hometoast
this particular regexbuddy link is a 404.
TREE
+2  A: 

.NET Regex supports counting, so you can match nested parentheses which is something you normally cannot do with a regular expression. According to Mastering Regular Expressions that's one of the few implementations to do that, so that could be a difference.

Brian Rasmussen
I think you mean, so you can match NESTED parentheses (as well as other nested structures). No, Java's built-in regex flavor has no equivalent for that.
Alan Moore
@Alan - yup that is what I meant. Thanks.
Brian Rasmussen
+1  A: 

Java uses standard Perl type regex as well as POSIX regex. Looking at the C# documentation on regexs, it looks like that Java has all of C# regex syntax, but not the other way around.

Compare them yourself: Java: C#:

EDIT: Currently, no other regex flavor supports Microsoft's version of named capture.

WolfmanDragon
No, .Net has several features Java lacks, as well as vice-versa. In fact, when it comes to cool features, I'd say .Net has a clear lead. But I think they made a big mistake leaving out possessive quantifiers.
Alan Moore
+4  A: 

Check out: http://www.regular-expressions.info/refflavors.html Plenty of regex info on that site, and there's a nice chart that details the differences between java & .net.

Seth
+1 good info. If anyone wants to pull out the high-level data from here (named groups, full string v. partial matches, etc) I'll mark that as the answer.
TREE
+19  A: 

Differences are (from this site):

  1. \Q...\E escapes a string of metacharacters
    • .NET NO
    • Java YES
  2. \Q...\E escapes a string of character class metacharacters (in a character sets)
    • .NET NO
    • Java YES
  3. (?n) (explicit capture modifier)
    • .NET YES
    • Java NO
  4. ?+, *+, ++ and {m,n}+ (possessive quantifiers)
    • .NET NO
    • Java YES
  5. (?<=text) (positive lookbehind)
    • .NET Full regex
    • Java Finite length
  6. (?<!text) (negative lookbehind)
    • .NET Full regex
    • Java Finite length
  7. Conditionals of form (?(?=regex)then|else), (?(regex)then|else), (?(1)then|else) or (?(group)then|else)
    • .NET YES
    • Java NO
  8. (?#comment) comments
    • .NET YES
    • Java NO
  9. Character class is a single token (Free-spacing syntax)
    • .NET YES
    • Java NO
  10. \pL through \pC or \p{IsL} through \p{IsC} (Unicode properties)
    • .NET NO
    • Java YES
  11. \p{IsLu} through \p{IsCn} (Unicode property)
    • .NET NO
    • Java YES
  12. \p{InBasicLatin} through \p{InSpecials} or \p{IsBasicLatin} through \p{IsSpecials} (Unicode block)
    • .NET YES
    • Java NO
  13. Spaces, hyphens and underscores allowed in all long names listed above (e.g. BasicLatin can be written as Basic-Latin or Basic_Latin or Basic Latin)
    • .NET NO
    • Java YES (Java 5)
  14. Named captures of style (?<name>regex), (?'name'regex), \k<name> or \k'name'
    • .NET YES
    • Java NO
  15. Multiple capturing groups can have the same name
    • .NET YES
    • Java N/A (does not have named capture groups)
  16. XML character classes subtraction [abc-[abc]]
    • .NET YES (2.0)
    • Java NO
  17. \p{Alpha} POSIX character class
    • .NET NO
    • Java YES (ASCII)
Drew Noakes
that's going the extra mile. ;)
TREE
Typo (I assume) in #15: "(does not have NAMED capturing groups)"
Alan Moore
Thanks Alan, I've updated the answer.
Drew Noakes
# \pL through \pC or \p{IsL} through \p{IsC} (Unicode properties) -- You're wrong, .NET does have it, but it's called \p{L} instead of \pL or \p{IsL}.
Timwi