tags:

views:

50

answers:

4

Hi,

I want to know how this ambiguous pattern is solved in perl(more generally everything that use libpcre):

/(\r\n|\r|\n)/

When the pattern sees \r\n will it match one time or twice? And what is the rules face to this situation?

Thanks

+7  A: 

It will match \r\n once because Perl uses a regex-directed engine which evaluates alternations eagerly. See here.

You can easily find out whether the regex flavor you intend to use has a text-directed or regex-directed engine. If backreferences and/or lazy quantifiers are available, you can be certain the engine is regex-directed. You can do the test by applying the regex regex|regex not to the string regex not. If the resulting match is only regex, the engine is regex-directed. If the result is regex not, then it is text-directed. The reason behind this is that the regex-directed engine is "eager".

Mark Byers
If I understand correctly it mean the /(\r|\r\n|\n)/ have an other meaning.
mathk
Yes, `/(\r|\r\n|\n)/` would match `\r`, then `\n`.
Alan Moore
+1  A: 

It will try and match the pipe-separated alternatives in order from left to right. Thus the first alternative will match the entire string "\r\n", and there will only be one match. There's no ambiguity here.

bcat
A: 

It'll match it once. More here: http://technocage.com/~caskey/dos2unix/

NinjaCat
+1  A: 

...perl (more generally everything that use libpcre)

Possible misconception here: Perl does not "use libpcre". The PCRE library is a separate project that came along after Perl, and mimics much of Perl's regex functionality. PHP and ActionScript use libpcre, but most "Perl-derived" flavors (like Python, Java, and .NET) implement their regex support natively.

But they all share the trait in question here: they settle for the first alternative that works, rather than hold out for the longest match as a text-directed engine would.

Alan Moore