Hi,
I want to know how this ambiguous pattern is solved in perl(more generally everything that use libpcre):
/(\r\n|\r|\n)/
When the pattern sees \r\n
will it match one time or twice?
And what is the rules face to this situation?
Thanks
Hi,
I want to know how this ambiguous pattern is solved in perl(more generally everything that use libpcre):
/(\r\n|\r|\n)/
When the pattern sees \r\n
will it match one time or twice?
And what is the rules face to this situation?
Thanks
It will match \r\n
once because Perl uses a regex-directed engine which evaluates alternations eagerly. See here.
You can easily find out whether the regex flavor you intend to use has a text-directed or regex-directed engine. If backreferences and/or lazy quantifiers are available, you can be certain the engine is regex-directed. You can do the test by applying the regex
regex|regex not
to the stringregex not
. If the resulting match is onlyregex
, the engine is regex-directed. If the result isregex not
, then it is text-directed. The reason behind this is that the regex-directed engine is "eager".
It will try and match the pipe-separated alternatives in order from left to right. Thus the first alternative will match the entire string "\r\n"
, and there will only be one match. There's no ambiguity here.
It'll match it once. More here: http://technocage.com/~caskey/dos2unix/
...perl (more generally everything that use libpcre)
Possible misconception here: Perl does not "use libpcre". The PCRE library is a separate project that came along after Perl, and mimics much of Perl's regex functionality. PHP and ActionScript use libpcre, but most "Perl-derived" flavors (like Python, Java, and .NET) implement their regex support natively.
But they all share the trait in question here: they settle for the first alternative that works, rather than hold out for the longest match as a text-directed engine would.