tags:

views:

180

answers:

4

I'm matching urls, so I can connect requests to controllers/views, and there are multiple options for a few of the urls, only one of which can have anything following it in the url, but I also need to have what comes after available as a named group.

Examples:

  • /admin/something #match
  • /admin/something/new #match
  • /admin/something/new/id #fail
  • /admin/something/edit #fail
  • /admin/something/edit/id #match

There are many other possibilities, but thats good enough for an example. Basically, if the url ends in 'new', nothing can follow, while if it ends in 'edit' it also must have an id to edit.

The regex I've been using so far:

^/admin/something(?:/(?P<action>new|edit(?:/(?P<id>\d{1,5}))))?$

A whitespace-exploded version:

^/admin/something(?:/
    (?P<action>
        new|        # create a new something
        edit(?:/    # edit an old something
                (?P<id>\d{1,5})    # id to edit
            )
        )
    )?    # actions on something are optional
$

But then if the url is '/admin/something/edit/id' the 'action' group is 'edit/id'. I've been using a little bit of string manip within the controller to cut down the action to just... the action, but I feel like a positive lookahead would be much cleaner. I just haven't been able to get that to work.

The lookahead regex I've been working at: (will match 'new', but not 'edit' [with or without an id])

^/admin/something(?:/(?P<action>new|edit(?=(?:/(?P<id>\d{1,5})))))?$

Any tips/suggestions would be much appreciated.

+2  A: 

Your problem lies with the $ at the end. This is a zero-width assertion that the regex matches to the end of the line. However, your lookahead is also a zero-width assertion (that id follows edit). The reason it's called a lookahead is because it matches within the lookahead, and then skips back to the beginning of that match. So it's failing on ...edit/id because it's trying to assert both that /id follows edit and /edit is the end of the line. It fails on ...edit alone because it's trying to assert that /id follows edit.

There are two potential solutions. The first is to simply take out the $. This may not be desirable because then it could match .../edit/id/gobbledygook. The second solution is to use your regex language's method of reusing captured groups. I can't help you there because I don't know what regex you're using. I don't recognize the P<name> syntax for named capturing. You would put whatever you need for that after the <action> group.

Sean Nyman
It's php, but I think the P<name> comes from Python originally. Thanks for the suggestion, sounds like a good place to start mucking around.
nilamo
Source found: http://www.regular-expressions.info/named.html : Python's regex module was the first to offer a solution: named capture. I like using named groups a whole lot. The regex can change, and the code hiding behind it can stay the same since it references a name, instead of a position within the regex. PlusAlso, it makes the code a lot cleaner.
nilamo
+1  A: 
^/admin/something
(
    $               |
    /new$           |
    /edit/(\d{5})$
)
FM
A: 

The answer I came to uses parts from both of the above answers to create a regex with lookahead that also stores all the values I want in named groups, without extra clutter such as forward slashes. It matches everything I want it to, and fails everything else. Perfect.

^/admin/something(?:(?:/
                        (?P<action>
                            new$|
                            edit(?=/(?P<id>\d{1,5})$)
                        )
                    )|$)

I wish I could mark more than one as the answer, since they both helped me find the one true path.

nilamo
One thing I just realized, if you want edit/id to match to the end of the line (this answer only has new or just /admin/something doing that), you can just put the $ within the lookahead, like `(?=/(?P<id>\d{1,5})$)`. HTH.
Sean Nyman
@Darth Eru: wonderful, thank you. The above example has been edited to reflect this.
nilamo
A: 

non regex way,

$str = "/admin/something";
$s = explode("/",$str);
if ( end($s) == "something" || end($s) == "new" ){
    print "ok\n";
}
if ( strpos($str,"edit" )!==FALSE && is_numeric(end($s)) ){
    print "ok\n";
}
ghostdog74
The example I was using really is very simplified. If I do it the non-regex way, it would actually be more confusing than a regex. Specifically, there's a third option to just 'new' and 'edit': 'list'. There are many options for list, such as how many results to show per page, which page to show, which column to order the results by, and whether to order those results by ascending or descending order. With whitespace in the regex and comments, the whole shebang is 13 lines. Non regex would almost certainly be longer and more confusing.
nilamo
But if it was less complex, then this is what I would have done from the start.
nilamo
think about it, if you have many other things to add to your list , it would be the same wouldn't it? your regex is going to get longer and more unreadable. Its still clearer, when you break down what you want to do into portions you can understand. remember, your code is not for you only. you would also what whoever maintains it after you, to be able to read what you write.
ghostdog74