ansaurus

Question

What is the point behind character class intersections in Java's Regex?

Answer 1

+1 A:

I believe that particular sample is just a "proof of concept." Two intersected character classes only match a character that matches both character sets individually. The substractions you mentioned are the real practical applications of the operator.

Simply put, there is no hidden meaning.

Blixt 2009-07-09 21:03:03

Answer 2

+1 A:

you can build a matching regexp between two sets programatically:

String regex = String.format("[%s&&[%s]]", characterClass, whiteList);

dfa 2009-07-09 21:04:23

This makes some sense from a theoretical perspective, but what's a practical example where you'd ever use this?

Christopher 2009-07-09 21:20:01

I don't have any pratical example right now :-(

dfa 2009-07-09 21:51:29

Answer 3

+4 A:

Though I've never had the need to do so, I could imagine a use with pre-defined character classes that aren't proper subsets of each other (thus making the intersection produce something different than the original two character classes). E.g. matching only lower case Latin characters:

[\p{Ll}&&\p{InBasicLatin}]

iammichael 2009-07-09 21:13:41

Perhaps, but the result of the RE above is no different from [\p{Lower}], since \p{Lower} by definition is only the lower case letters in your default alphabet. If your default alphabet is not latin, that RE will in fact match nothing.

Christopher 2009-07-09 21:25:14

Changed \p{Lower} to \p{Ll} to avoid the default alphabet issue.

iammichael 2009-07-09 21:36:27

Answer 4

+1 A:

Intersection is useful when the one class is not a subset of the other set. There are many predefined character classes (a partial list is given in the javadoc), in particular the various blocks of unicode. Suppose that there is a defined block for all the characters used in Chinese and one for all of the characters used in Japanese. There is a good amount of overlap, but it isn't complete on either side (I'm not sure if the unicode block classes reflect this). If you wanted to match only the characters that occur in both languages, you might use an intersection of the two.

kwatford 2009-07-09 21:14:06

ansaurus

tags:

views:

answers:

What is the point behind character class intersections in Java's Regex?

related questions