tags:

views:

98

answers:

6

Hello,

Let say that i have this to url's

site.com/hello-world/test.html
site.com/hello-world/test/test.html

if i go to the first url i have this regex

^.*/([a-z0-9,-]+)/([a-z0-9,-]+).html$

but url 2 is also vaild url with the regex? how to tell the regex that the first url is the url who should be vaild and not the second?

A: 

I think the problem is the use of the greedy match-all .* at the beginning of your expression.

Cheat a little:

^.*(com|org|edu|net|gov)/([a-z0-9,-]+)/([a-z0-9,-]+).html$
David Andres
A: 

.* matches "site.com/hello-world" in the second case. You have to be more specific for the domain part.

stefanw
A: 

In the second case, .* is matching more than you would expect.

Perhaps replace it with the non-greedy quantifier:

^.*?/([a-z0-9,-]+)/([a-z0-9,-]+).html$
spender
+2  A: 

For the first URL the .* part of the pattern matches "site.com", but for the second URL it matches "site.com/hello-world".

If you don't want to allow more than one folder, you can disallow slash characters in the part of the pattern that matches the domain name:

^[^/]*/([a-z0-9,-]+)/([a-z0-9,-]+)\.html$

(Note that I put a backslash before the period before the html extension. A period matches any character, while \. matches only a period.)

Edit:
If you want to allow both URLs and use "hello-world/test" as folder for the second one, allow slashes in the folder part:

^[^/]*/([a-z0-9,-/]+)/([a-z0-9,-]+)\.html$

If you want to use "hello-world" as folder and "test/test" as page, allow slashes in the file name part:

^[^/]*/([a-z0-9,-]+)/([a-z0-9,-/]+)\.html$
Guffa
i want to allow site.com/hello-world/test.html and site.com/hello-world/test/test.htmlbut the are to different pages.
Frozzare
@Frozzare: You specifially asked that the second url should not be valid... I added some alternatives in the answer.
Guffa
Yes, because the first url only should work
Frozzare
@Frozzare: I don't understand what you want, you seem to contradict yourself over and over... I have given you alternatives both for matching only the first URL and for matching both URLs, something should match your requirements...
Guffa
+3  A: 

Of course the second string it is also valid against your regex:

sub-expression        result
-----------------------------------------------------------------------
^.*                   matches:   "site.com/hello-world/test/test.html"
/                     backtrack: "site.com/hello-world/test/"
([a-z0-9,-]+)         matches:   "site.com/hello-world/test/test" 
/                     backtrack: "site.com/hello-world/test/"
([a-z0-9,-]+).html$   matches:   "site.com/hello-world/test/test.html"

better:

sub-expression        result
-----------------------------------------------------------------------
^[^/]+                matches:   "site.com"
/                     matches:   "site.com/"
([a-z0-9,-]+)         matches:   "site.com/hello-world" 
/                     matches:   "site.com/hello-world/"
([a-z0-9,-]+)\.html$  fails (which is the expected result)

So you should use:

^[^/]+/([a-z0-9,-]+)/([a-z0-9,-]+)\.html$
Tomalak
^[^/]./([a-z0-9,-]+)/([a-z0-9,-]+).html$ fails?
Frozzare
That is what you seem to want - the second string should not match, in regex terms that is "the regex fails for this string".
Tomalak
yes, the second url should fail not the first one.
Frozzare
A: 

Not a solution, just a suggestion: there are lots of excellent tools that allow you to experiment with regular expressions and actually help you writing them.
I'm particularly fond of Expresso; apparently also The Regulator is a very good one.

Paolo Tedesco