tags:

views:

109

answers:

4

End of line anchor $ match even there is extra trailing \n in matched string, so we use \Z instead of $

For example

^\w+$ will match the string abcd\n but ^\w+\Z is not

How about \A and when to use?

+5  A: 

Most often it's used when also enabling multi-line matches. Since \A only matches at the beginning of the ENTIRE text, as opposed to just a line beginning, in regexes that can match across lines the functionality of ^ and \A are different.

Amber
+1 thanks, but people really using it? Isn't removing `/m` and using `^` is same behavior?
S.Mark
Yes, I use it. Think about trying to search a string that YOU expect to not have embedded new-lines, but a user put them in. You expect to be able to search the entire string but ^ and $ get confused and only scan the first part of the string. You could end up injecting some evil code into a query or storing it in a database. Yeah, it's confusing but \m, \A, \z, ^ and $ have their uses so you need to understand when and where to use them. Maybe that's not a good example but it can be really important. Maybe someone can add some real world examples.
Greg
If you're trying to, say, match a particular pair of lines at the beginning of a logfile within a set of logs, you'd need to have multi-line matching enabled, but couldn't just use `^` (since you're wanting to match lines at the beginning of the logfile, not in the middle). That would be a potential use case.
Amber
+3  A: 

As with any regex feature, you use it when it more exactly describes what you need as opposed to any more general feature. If you know that you want to match exactly at the start of a string (instead of logical lines), use the regex feature that describes that. Don't use regex features that could possibly match in situations that you don't want.

For Perl, see the perlre docs for details about the zero-width assertions:

\b  Match a word boundary
\B  Match except at a word boundary
\A  Match only at beginning of string
\Z  Match only at end of string, or before newline at the end
\z  Match only at end of string
\G  Match only at pos() (e.g. at the end-of-match position
    of prior m//g)
brian d foy
+1  A: 

Not directly relevant to your question according to the tags you used, but there is at least one language (Ruby) where ^ and $ always mean start/end-of-line, so if you want to match start/end-of-string you have to use \A and \Z or \z.

If you want to keep your regexes portable, it's good practice to explicitly state what you want them to do instead of relying on the availability of mode modifiers like \m or Regex.MULTILINE etc.

On the other hand, JavaScript, POSIX and XML do not support \A and \Z. This is where tools like RegexBuddy come in handy that translate regexes from one flavor to the other for you.

Tim Pietzcker
Thanks for the info about ruby regex, +1
S.Mark
+2  A: 

If the regex flavor you're working with supports \A then I recommend you always use it instead of ^. \A always matches at the start of the string only in all flavors that support it. There is no issue with line breaks.

^ may match at the start of the string only or at the start of any line depending on the regex flavor and regex options.

By using \A you reduce the potential for confusion when somebody else has to maintain your code.

Jan Goyvaerts
Thanks for suggestions Jan Goyvaerts, thats make sense
S.Mark