views:

994

answers:

6

I could use some help writing a regular expression. In my Django application, users can hit the following URL:

http://www.example.com/A1/B2/C3

I'd like to create a regular expression that allows accepts any of the following as a valid URL:

http://www.example.com/A1  
http://www.example.com/A1/B2  
http://www.example.com/A1/B2/C3

I'm guessing I need to use the "OR" conditional, but I'm having trouble getting my regex to validate. Any thoughts?

UPDATE: Here is the regex so far. Note that I have not included the "http://www.example.com" portion -- Django handles that for me. I'm just concerned with validating 1,2, or 3 subdirectories.

^(\w{1,20})|((\w{1,20})/(\w{1,20}))|((\w{1,20})/(\w{1,20})/(\w{1,20}))$
+1  A: 
http://www\.example\.com/A1(/B2(/C3)?)?
James Curran
+5  A: 

Skip the |, use the ? and ()

http://www\.example\.com/A1(/B2(/C3)?)?

And if you replace the A1-C3 with a pattern:

http://www\.example\.com/[^/]*(/[^/]*(/[^/]*)?)?

Explanation:

  • it matches every string that starts with http://www.example.com/A1
  • it can match an additional /B2 and even an additional /C3, but /C3 is only matched, when there is a /B2
  • [^/]* (as many non slashes as possible)
  • if you need the A1-C3 in special capture groups, you can use this:

http://www\.example\.com/([^/]*)(/([^/]*)(/([^/]*))?)?

Will give (groupnumber: content):

matches: 0: (http://www.example.com/dir1/dir2/dir3)
1: (dir1)
2: (/dir2/dir3)
3: (dir2)
4: (/dir3)
5: (dir3)

You can check it out online here or get this tool (yes it's free, and it's even written in Lisp...).

Andre Bossard
+1  A: 
 ^(\w{1,20})(/\w{1,20})*

this is for as many subdirectories as you like if you only want 2:

 ^(\w{1,20})(/\w{1,20}){0,2}
Epaga
+1  A: 

If I'm understanding, I think you just need another set of parens around the whole OR statement:

^((\w{1,20})|((\w{1,20})/(\w{1,20}))|((\w{1,20})/(\w{1,20})/(\w{1,20})))$
Lucas Oman
+1  A: 

Be aware that Django's reverse URL matching (permalinks, reverse() and {% url %}) can handle a limited subset of regular expressions. To be able to use them, it's sometimes necessary to split complex regexes into separate URL dispatcher rules.

akaihola
+2  A: 

There's a much more Django way to do this:

urlpatterns = patterns('',
    url(r'^(?P<object_slug1>\w{2}/(?P<object_slug2>\w{2}/(?P<object_slug3>\w{2})$', direct_to_template, {"template": "two_levels_deep.html"}, name="two_deep"),
    url(r'^(?P<object_slug1>\w{2}/(?P<object_slug2>\w{2})$', direct_to_template, {"template": "one_level_deep.html"}, name="one_deep"),
    url(r'^(?P<object_slug1>\w{2})$', direct_to_template, {"template": "homepage.html"}, name="home"),
)

The other methods don't take advantage of Django's power to pass variables.

Edit: I switched the order of the urlpattern to be more obvious for the parser (i.e. bottom up is more defined than top down).

Adam Nelson