views:

139

answers:

3

I'm trying to write a very simple regular expression that matches any file name that doesn't end in .php. I came up with the following...

(.*?)(?!\.php)$

...however this matches all filenames. If someone could point me in the right direction I'd be very grateful.

+2  A: 

You are at the end of the string and looking ahead. What you want is a look behind instead:

(.*)$(?<!\.php)

Note that not all regular expression engines support lookbehind assertions.

Mark Byers
I know you just copied it from the OP's regex, but that reluctant quantifier makes no sense. You're forcing the regex to evaluate the lookbehind once for each character in the string, when you know it only needs to be applied at the end. In fact, I would put the lookbehind *after* the anchor: `.*$(?<!\.php)`
Alan Moore
@Alan Moore: Yes, you're probably right about the performance of placing the lookahead after the anchor. Though I'm not sure if in some regular expression engines the `$` anchor could consume a trailing new line character which would give a different result. This is probably not going to be an issue when parsing URLs in Apache though.
Mark Byers
+3  A: 

Instead of using negative lookahead, sometimes it's easier to use the negation outside the regex at the hosting language level. In many languages, the boolean complement operator is the unary !.

So you can write something like this:

! str.hasMatch(/\.php$/)

Depending on language, you can also skip regex altogether and use something like (e.g. Java):

! str.endsWith(".php")

As for the problem with the original pattern itself:

(.*?)(?!\.php)$   // original pattern, doesn't work!

This matches, say, file.php, because the (.*?) can capture file.php, and looking ahead, you can't match \.php, but you can match a $, so altogether it's a match! You may want to use look behind, or if it's not supported, you can lookahead at the start of the string.

^(?!.*\.php$).*$  // negative lookahead, works

This will match all strings that does not end with ".php" using negative lookahead.

References

Related questions

polygenelubricants
An example pattern that's a bit more specific, capturing prefix part and extension part separately. http://www.rubular.com/r/xHizeFtbXb - has lots of room for improvement, but that would require more precise specification.
polygenelubricants
thanks poly, I should have mentioned in my OP that this is for Apache proxy stuff. Normally I would do the negation in code, but obviously we don't have the option in this case. Thanks for your reply in any case.
Simon Stevens
+2  A: 

Almost:

.*(?!\.php)....$

The last four dots make sure that there is something to look ahead at, when the look-ahead is checked.

The outer parentheses are unnecessary since you are interested in the entire match.

The reluctant .*? is unnecessary, since backtracking four steps is more efficient than checking the following condition with every step.

Tomalak
many thanks Tomalak, that works perfectlyfor anybody looking at this at a later date, the final code we used wasProxyPassMatch ^(.*)\.((?!php)...)*$ http://127.0.0.1:12345/$1.$2
Simon Stevens
Minor pedant: This won't match files with filenames less than 4 characters long.
Mark Byers
ProxyPassMatch ^(.*\.(?!php).*)$ http://127.0.0.1:12345/$1 - is the final version of what we used for future reference, it also fixes the filename length issue (tested with filename i.php)
Simon Stevens
@Mark Byers: You are right, I did not think of this case. Correct would be `^(.{1,3}|.*(?!\.php)....)$`.
Tomalak