ansaurus

Question

Whats the difference between \z and \Z in a regular expression and when and how do I use it?

Answer 1

+1 A:

"Even though \Z and $ only match at the end of the string (when the option for the caret and dollar to match at embedded line breaks is off), there is one exception. If the string ends with a line break, then \Z and $ will match at the position before that line break, rather than at the very end of the string. This "enhancement" was introduced by Perl, and is copied by many regex flavors, including Java, .NET and PCRE. In Perl, when reading a line from a file, the resulting string will end with a line break. Reading a line from a file with the text "joe" results in the string joe\n. When applied to this string, both ^[a-z]+$ and \A[a-z]+\Z will match joe.

If you only want a match at the absolute very end of the string, use \z (lower case z instead of upper case Z). \A[a-z]+\z does not match joe\n. \z matches after the line break, which is not matched by the character class."

http://www.regular-expressions.info/anchors.html

The way I read this "StackOverflow\n".matches("StackOverflow\z") should return false because your pattern does not include the newline.

"StackOverflow\n".matches("StackOverflow\\z\\n") => false
"StackOverflow\n".matches("StackOverflow\\Z\\n") => true

Jakob Kruse 2010-04-25 11:01:01

Answer 2

+1 A:

Just checked it. It looks like when Matcher.matches() is invoked(like in your code, behind the scenes), \Z behaves like \z. However, when Matcher.find() is invoked, they behave differently as expected. The following returns true:

Pattern p = Pattern.compile("StackOverflow\\Z");
Matcher m = p.matcher("StackOverflow\n");
System.out.println(m.find());

and if you replace \Z with \z it returns false.

I find this a little surprising...

Eyal Schneider 2010-04-25 11:01:36

Answer 3

A:

Like Eyal said, it works for find() but not for matches().

This actually makes sense. The \Z anchor itself actually does match the position right before the final eol terminator, but the regular expression as a whole does not match, because, as a whole, it needs to match the entire text being matched, and nothing matches the terminator. (The \Z matches the position right before the terminator, which is not the same thing.)

If you did "StackOverflow\n".matches("StackOverflow\\Z.*") you should be ok.

Avi 2010-04-25 15:43:17

\z (lowercase z) does not match before the newline, it matches at the very end, after the newline.

Jakob Kruse 2010-04-25 16:07:42

@Jakob: You are right. I meant \Z, of course - that is the one with the special meaning. I was confused by the wording in the question. Fixed now.

Avi 2010-04-25 16:55:45

\Z (uppercase) actually does match right before the final newline, as defined by the javadocs. The perl docs (http://perldoc.perl.org/perlre.html) make it even clearer: "\Z Match only at end of string, or before newline at the end"

Avi 2010-04-25 16:56:52

ansaurus

tags:

views:

answers:

Whats the difference between \z and \Z in a regular expression and when and how do I use it?

related questions