Sorry for such a long post. The sample outputs have been edited for clarity.
The Perl regex engine takes shortcuts. So my first run didn't output anything all that helpful.
perl -Mre=debug -e' "abcdefghijklm" =~ /[^<]*<[?]/; '
Compiling REx "[^<]*<[?]"
Final program:
1: STAR (13)
2: ANYOF[\0-;=-\377{unicode_all}] (0)
13: EXACT <<?> (17)
17: END (0)
floating "<?" at 0..2147483647 (checking floating) minlen 2
Guessing start of match in sv for REx "[^<]*<[?]" against "abcdefghijklm"
Did not find floating substr "<?"...
Match rejected by optimizer
Freeing REx: "[^<]*<[?]"
So in order to get it to output something useful I have to trick the regex engine into thinking it might succeed.
perl -Mre=debug -e' "ab<?" =~ /[^<]*(?!<)<[?]/; '
Compiling REx "[^<]*(?!<)<[?]"
Final program:
1: STAR (13)
2: ANYOF[\0-;=-\377{unicode_all}] (0)
13: UNLESSM[0] (19)
15: EXACT <<> (17)
17: SUCCEED (0)
18: TAIL (19)
19: EXACT <<?> (23)
23: END (0)
floating "<?" at 0..2147483647 (checking floating) minlen 2
Guessing start of match in sv for REx "[^<]*(?!<)<[?]" against "ab<?"
Found floating substr "<?" at offset 2...
Guessed: match at offset 0
Matching REx "[^<]*(?!<)<[?]" against "ab<?"
# Start at first pos()
# |
# V
0 <> <ab<?> | 1:STAR(13)
ANYOF[\0-;=-\377{unicode_all}] can match 2 times out of 2147483647...
2 <ab> <<?> | 13: UNLESSM[0](19)
2 <ab> <<?> | 15: EXACT <<>(17)
3 <ab<> <?> | 17: SUCCEED(0)
subpattern success...
failed...
# try with one fewer [^<]*
1 <a> <b<?> | 13: UNLESSM[0](19)
1 <a> <b<?> | 15: EXACT <<>(17)
failed...
# try with one fewer [^<]* again
1 <a> <b<?> | 19: EXACT <<?>(23)
failed...
# try with zero [^<]*
0 <> <ab<?> | 13: UNLESSM[0](19)
0 <> <ab<?> | 15: EXACT <<>(17)
failed...
0 <> <ab<?> | 19: EXACT <<?>(23)
failed...
failed...
# Start at second pos()
# |
# V
1 <a> <b<?> | 1:STAR(13)
ANYOF[\0-;=-\377{unicode_all}] can match 1 times out of 2147483647...
2 <ab> <<?> | 13: UNLESSM[0](19)
2 <ab> <<?> | 15: EXACT <<>(17)
3 <ab<> <?> | 17: SUCCEED(0)
subpattern success...
failed...
1 <a> <b<?> | 13: UNLESSM[0](19)
1 <a> <b<?> | 15: EXACT <<>(17)
failed...
1 <a> <b<?> | 19: EXACT <<?>(23)
failed...
failed...
# Start at third and final pos()
# |
# V
2 <ab> <<?> | 1:STAR(13)
ANYOF[\0-;=-\377{unicode_all}] can match 0 times out of 2147483647...
2 <ab> <<?> | 13: UNLESSM[0](19)
2 <ab> <<?> | 15: EXACT <<>(17)
3 <ab<> <?> | 17: SUCCEED(0)
subpattern success...
failed...
failed...
Match failed
Freeing REx: "[^<]*(?!<)<[?]"
In case you missed it, it tries to match '[^<]*'
, every possible way it can, before failing. Just imagine if you tried to run this match against a large file, only to find out that the final two characters weren't '<?'
.
A better idea it to use maximal matching, and the Beginning of line, zero width assertion.
^
is BOL(beginning of line) in the following text.
perl -Mre=debug -e' "abcdefghijklm<?" =~ /^[^<]*+(?!<)<[?]/; '
Compiling REx "^[^<]*+(?!<)<[?]"
Final program:
1: BOL (2)
2: SUSPEND (18)
4: STAR (16)
5: ANYOF[\0-;=-\377{unicode_all}] (0)
16: SUCCEED (0)
17: TAIL (18)
18: UNLESSM[0] (24)
20: EXACT <<> (22)
22: SUCCEED (0)
23: TAIL (24)
24: EXACT <<?> (28)
28: END (0)
floating "<?" at 0..2147483647 (checking floating) anchored(BOL) minlen 2
Guessing start of match in sv for REx "^[^<]*+(?!<)<[?]" against "abcdefghijklm<?"
Found floating substr "<?" at offset 13...
Guessed: match at offset 0
Matching REx "^[^<]*+(?!<)<[?]" against "abcdefghijklm<?"
0 <> <abcdefghij> | 1:BOL(2)
0 <> <abcdefghij> | 2:SUSPEND(18)
0 <> <abcdefghij> | 4: STAR(16)
ANYOF[\0-;=-\377{unicode_all}] can match 13 times out of 2147483647...
13 <defghijklm> <<?> | 16: SUCCEED(0)
subpattern success...
13 <defghijklm> <<?> | 18:UNLESSM[0](24)
13 <defghijklm> <<?> | 20: EXACT <<>(22)
14 <defghijklm<> <?> | 22: SUCCEED(0)
subpattern success...
failed...
Match failed
Freeing REx: "^[^<]*+(?!<)<[?]"
You should note that this failed much quicker than the previous example.