tags:

views:

204

answers:

4

Java's Regex.Pattern supports the following character class:

[a-z&&[def]]

which matches "d, e, or f" and is called an intersection.

Functionally this is no different from:

[def]

which is simpler to read and understand in a big RE. So my question is, what use are intersections, other than specifying complete support for CSG-like operations on character classes?

(Please note, I understand the utility of subtractions like [a-z&&[^bc]] and [a-z&&[^m-p]], I am asking specifically about intersections as presented above.)

+1  A: 

I believe that particular sample is just a "proof of concept." Two intersected character classes only match a character that matches both character sets individually. The substractions you mentioned are the real practical applications of the operator.

Simply put, there is no hidden meaning.

Blixt
+1  A: 

you can build a matching regexp between two sets programatically:

String regex = String.format("[%s&&[%s]]", characterClass, whiteList);
dfa
This makes some sense from a theoretical perspective, but what's a practical example where you'd ever use this?
Christopher
I don't have any pratical example right now :-(
dfa
+4  A: 

Though I've never had the need to do so, I could imagine a use with pre-defined character classes that aren't proper subsets of each other (thus making the intersection produce something different than the original two character classes). E.g. matching only lower case Latin characters:

[\p{Ll}&&\p{InBasicLatin}]
iammichael
Perhaps, but the result of the RE above is no different from [\p{Lower}], since \p{Lower} by definition is only the lower case letters in your default alphabet. If your default alphabet is not latin, that RE will in fact match nothing.
Christopher
Changed \p{Lower} to \p{Ll} to avoid the default alphabet issue.
iammichael
+1  A: 

Intersection is useful when the one class is not a subset of the other set. There are many predefined character classes (a partial list is given in the javadoc), in particular the various blocks of unicode. Suppose that there is a defined block for all the characters used in Chinese and one for all of the characters used in Japanese. There is a good amount of overlap, but it isn't complete on either side (I'm not sure if the unicode block classes reflect this). If you wanted to match only the characters that occur in both languages, you might use an intersection of the two.

kwatford