tags:

views:

3256

answers:

4

I have a url and I'm trying to match it to a regular expression to pull out some groups. The problem I'm having is that the url can either end or continue with a "/" and more url text. I'd like to match urls like this:

But not match something like this:

So, I thought my best bet was something like this:

/(.+)/(\d{4}-\d{2}-\d{2})-(\d+)[/$]

where the character class at the end contained either the "/" or the end-of-line. The character class doesn't seem to be happy with the "$" in there though. How can I best discriminate between these urls while still pulling back the correct groups?

+9  A: 

/(.+)/(\d{4}-\d{2}-\d{2})-(\d+)(/.*)?$

Adam Tegen
+2  A: 

To match either / or end of content, use (/|\z)

This only applies if you are not using multi-line matching (i.e. you're matching a single URL, not a newline-delimited list of URLs).


To put that with an updated version of what you had:

/(\S+?)/(\d{4}-\d{2}-\d{2})-(\d+)(/|\z)

Note that I've changed the start to be a non-greedy match for non-whitespace ( \S+? ) rather than matching anything and everything ( .* )

Peter Boughton
+1  A: 

I can't really comment on the regex itself, but I wanted to let you know about a very useful tool, Expresso, which will help you construct and test your regexes before you try and code for them.

Ian Jacobs
+7  A: 

You've got a couple regexes now which will do what you want, so that's adequately covered. What hasn't been mentioned is why your attempt won't work: Inside a character class, $ (as well as ^, ., and /) has no special meaning, so [/$] matches either a literal / or a literal $ rather than terminating the regex (/) or matching end-of-line ($).

Dave Sherohman
This is something frequently forgotten and not mentioned eneough in the regex docs.
Steve Dunn