tags:

views:

119

answers:

3

From http://java.sun.com/j2se/1.5.0/docs/api/java/util/regex/Pattern.html:

\Z  The end of the input but for the final terminator, if any
\z  The end of the input

But what does it mean in practice? Can you give me an example when I use either the \Z or \z.

In my test I thought that "StackOverflow\n".matches("StackOverflow\\z") will return true and "StackOverflow\n".matches("StackOverflow\\Z") returns false. But actually both return false. Where is the mistake?

+1  A: 

"Even though \Z and $ only match at the end of the string (when the option for the caret and dollar to match at embedded line breaks is off), there is one exception. If the string ends with a line break, then \Z and $ will match at the position before that line break, rather than at the very end of the string. This "enhancement" was introduced by Perl, and is copied by many regex flavors, including Java, .NET and PCRE. In Perl, when reading a line from a file, the resulting string will end with a line break. Reading a line from a file with the text "joe" results in the string joe\n. When applied to this string, both ^[a-z]+$ and \A[a-z]+\Z will match joe.

If you only want a match at the absolute very end of the string, use \z (lower case z instead of upper case Z). \A[a-z]+\z does not match joe\n. \z matches after the line break, which is not matched by the character class."

http://www.regular-expressions.info/anchors.html

The way I read this "StackOverflow\n".matches("StackOverflow\z") should return false because your pattern does not include the newline.

"StackOverflow\n".matches("StackOverflow\\z\\n") => false
"StackOverflow\n".matches("StackOverflow\\Z\\n") => true
Jakob Kruse
+1  A: 

Just checked it. It looks like when Matcher.matches() is invoked(like in your code, behind the scenes), \Z behaves like \z. However, when Matcher.find() is invoked, they behave differently as expected. The following returns true:

Pattern p = Pattern.compile("StackOverflow\\Z");
Matcher m = p.matcher("StackOverflow\n");
System.out.println(m.find());

and if you replace \Z with \z it returns false.

I find this a little surprising...

Eyal Schneider
A: 

Like Eyal said, it works for find() but not for matches().

This actually makes sense. The \Z anchor itself actually does match the position right before the final eol terminator, but the regular expression as a whole does not match, because, as a whole, it needs to match the entire text being matched, and nothing matches the terminator. (The \Z matches the position right before the terminator, which is not the same thing.)

If you did "StackOverflow\n".matches("StackOverflow\\Z.*") you should be ok.

Avi
\z (lowercase z) does not match before the newline, it matches at the very end, after the newline.
Jakob Kruse
@Jakob: You are right. I meant \Z, of course - that is the one with the special meaning. I was confused by the wording in the question. Fixed now.
Avi
\Z (uppercase) actually does match right before the final newline, as defined by the javadocs. The perl docs (http://perldoc.perl.org/perlre.html) make it even clearer: "\Z Match only at end of string, or before newline at the end"
Avi