tags:

views:

185

answers:

3

I'm trying to use this regular expression, but I'm having trouble testing it, because for whatever reason it is crashing the only two regex testers I use.

^([a-zA-Z-]+/?)+$

If I'm not mistaken, that should match any URI that follows this format:

some-uri-string/some-uri-string/some-uri-string

However, I don't want it to match:

some-uri-string/some-uri-string//some-uri-string

Notice the double slash. Essentially, it is category/sub-category/sub-sub-category/page

With any number of sub-categories being allowed.

+2  A: 

Look up catastrophic backtracking... You need to define it a little better, or write one that chomps the URI string one part at a time... This one is recursive, and why it crashes on you... Or, you may change it from Greedy to lazy.. I didn't actually DEBUG this regex, but that's what I see...

LarryF
+3  A: 

The problem is that, for a non-matching string, your pattern is a pathological case for a backtracking regular expression engine (which is true of those in most scripting languages, like Perl, Python, and anything based on PCRE).

More information:

You could rewrite it as:

^([a-zA-Z-]+/)*[a-zA-Z-]+$

Or your RE implementation may have other facilities available to avoid that scenario.

Miles
+1 I thought of exactly the same regex
David Zaslavsky
I made one small change. ^([a-zA-Z-]+/)*[a-zA-Z-]+/?$Thanks for the help!
hal10001
A: 

For PCRE, you can make the group atomic to avoid catastrophic backtracking. Just add ?> like this:

^(?>[a-zA-Z-]+/?)+$

See the PCRE manual under "Atomic Grouping and Possessive Quantifiers".

Bennett McElwee