views:

111

answers:

3

Which regular expression engine does Java uses?

In a tool like RegexBuddy if I use

[a-z&&[^bc]]

that expression in Java is good but in RegexBuddy it has not been understood.

In fact it reports:

Match a single character present in the list below [a-z&&[^bc]

  • A character in the range between a and z : a-z
  • One of the characters &[^bc : &&[^bc
  • Match the character ] literally : ]

but i want to match a character between a and z intersected with a character that is not b or c

+4  A: 

Java uses its own regular expression engine, which behaviour is defined in the Pattern class.

You can test it with an Eclipse plugin or online.

Riduidel
Thanks Aaron for the added link !
Riduidel
Good link thanks!
xdevel2000
+8  A: 

Like most regex flavors, java.util.regex.Pattern has its own specific features with syntax that may not be fully compatible with others; this includes character class union, intersection and subtraction:

  • [a-d[m-p]] : a through d, or m through p: [a-dm-p] (union)
  • [a-z&&[def]] : d, e, or f (intersection)
  • [a-z&&[^bc]] : a through z, except for b and c: [ad-z] (subtraction)

The most important "caveat" of Java regex is that matches attempts to match a pattern against the whole string. This is atypical of most engines, and can be a source of confusion at times.

See also


On character class subtraction

Subtraction allows you to define for example "all consonants" in Java as [a-z&&[^aeiou]].

This syntax is specific to Java. In XML Schema, .NET, JGSoft and RegexBuddy, it's [a-z-[aeiou]]. Other flavors may not support this feature at all.

References

Related questions

polygenelubricants
I can't test this since I don't own RegexBuddy, but regular-expressions.info claims that JGSoft supports the same subtraction syntax as .NET, and RegexBuddy v3+ extends JGSoft (or something like that), so if someone can confirm that the `[a-z-[aeiou]]` subtraction syntax indeed works in RegexBuddy, I can add that into my answer.
polygenelubricants
Alan Moore
@Alan Moore: the omissions from both RegexBuddy and regular-expressions.info suggest that perhaps Jan isn't aware of this Java-specific syntax. In any case, I've taken the initiative to e-mail him about these issues. Who knows, he may even give me a free copy of RegexBuddy if he's feeling generous =)
polygenelubricants
I was going to say the subtraction syntax also conforms to the Unicode regex standard (UTS #18), but it had been a while since I last looked at that standard. It's turned into a monster! http://unicode.org/reports/tr18/#Subtraction_and_Intersection
Alan Moore
+1  A: 

RegexBuddy does not yet support the character class union, intersection, and subtraction syntax that is unique to the Java regular expression flavor. This is the only part of the Java regex syntax that RegexBuddy does not yet support. We're planning to implement this in a future version of RegexBuddy. The reason this has been postponed is because no other regular expression flavor supports this syntax.

P.S.: If you have a question about RegexBuddy in particular, please add the "regexbuddy" tag to your question. Then the question automatically shows up in my RSS reader. I don't follow the "regex" tag because far too many questions use that tag, and most are already answered by the time I see them.

Jan Goyvaerts