views:

138

answers:

2

Is there a way to rewrite this regex expression such that it does not include a lookahead for "/js"?

Is this even something that I should worry about in terms of performance? It is being used to filter HTTP requests.

\.(asmx(?!/js)|aspx|htm)

Edit: To be clear: I'd like to specifically prevent ".asmx/js" but allow all other .asmx requests through.

BAD: Portal.asmx/js
GOOD: Portal.asmx/UpdateProduct
+3  A: 

If you want to block Portal.asmx/js but allow Portal.asmx/UpdateProduct there are two ways to handle it - a whitelist pattern listing all the accepted values, or a negative lookahead for the unwanted matches.

A negative lookahead is almost certainly going be better performance than listing all the acceptable values.

However, simply using your existing expression will not match exactly what you want. It would block, for example, Portal.asmx/json and allow Portal.asmx/js.aspx - which might not be likely URLs, but simply highlight what needs fixing.

This expression (copied from eyelidlessness answer) will handle things appropriately:

\.(asmx(?!/js[/\z])|aspx$|html?$)


It's worth explaining that the [/\z] character class will match either / or <end of string> - the \z is the same as to $ but works in character classes (where the $ would match a literal $ character).
(There are differences between $ and \z but only in multiline mode, which isn't relevant for URL filtering).


In general, don't worry about performance unless you've got a measurable performance problem (otherwise how will you know if what you've changed made any difference).

Peter Boughton
-1: The question asks for a regex that can handle a path after asmx/, just not asmx/js
eyelidlessness
The original question *didn't* ask that, it's a new addition added an hour after I answered.
Peter Boughton
I just explained why I gave a -1, because the answer was incorrect. I think it's important to explain downvotes. I've removed the -1 now that you've edited.
eyelidlessness
Of course now SO won't let me upvote.
eyelidlessness
Fair enough. :) Would've preferred a notification of the original question change, but that's an SO problem, not you.
Peter Boughton
+2  A: 

Don't worry about performance of such a simple lookahead. Your regex is fine.

Edit: But it may catch false positives (eg Portal.asmx/jssomething), you might try something like:

\.(asmx(?!/js[/\z])|aspx$|html?$)
eyelidlessness