views:

37

answers:

3

Hello all,

I need a regex (to be used in .htaccess) with the following attributes, capturing the four-digit number and the text following it:

Match:
/9876/text_text_more_text_still_more_text
/8765/1234_text_text_text

Do not match:
/2010/08/01/text_text_more_text_still_more_text
/2010/08/01/text-text-more-text-still-more-text

So far I have:
/([0-9]+)/([^-/]+)

This unfortunately matches the do-not-match pattern. I'm definitely a neophyte at regexes but I think if I had a pointer in the right direction I could figure it out.

Thanks in advance.

+1  A: 

Use this regex

^/\d{4}/[^/]*?$
Gopi
The important point here is the start and end anchors, `^` and `$`. Without them, the regex can match `/2010/08/` **`01/text_text`**.
Kobi
As modified to capture the two pieces I need:`^/(\d{4})/([^/]*?)$`Thank you!
AndrewRich
+1  A: 

Try this:

^/(\d{4})/(\w+)$
fuwaneko
This worked, thank you!
AndrewRich
I didn't know \w included numbers, but this definitely seems to work. Nice simple pattern.
Dunderklumpen
\w includes any "word" character, i.e. digits, letters and underscore. However implementations may vary, and include other characters.
fuwaneko
A: 

If the entire string is only the URL, you could try the following pattern:

^/[0-9]+/(([0-9]{4}_)?[^/]+)$

A quick explanation of my approach

  • ^ represents the beginning of the string, $ the end of the string
  • /[0-9]+/ is the first number URL
  • ([0-9]{4}_)? is the additional optional starting four digit number.
  • [^/]+ will represent any characters not including a forward slash to the end of the string.
Dunderklumpen
I don't think the OP is looking to capture the second number, so `([0-9]{4}_)?` may be redundant.
Kobi
Actually the parens here are merely to allow the '0 or 1' operator on the four numbers. As a side-effect they could be captured, but can also be ignored.
Dunderklumpen
This one did not work for me. Matching was as expected but the second part was not captured.
AndrewRich
I did leave out the second capture. ^/[0-9]+/(([0-9]{4}_)?[^/]+)$ will do it.
Dunderklumpen