views:

289

answers:

3

Take for example the following regex match.

preg_match('!^publisher/([A-Za-z0-9\-\_]+)/([0-9]+)/([0-9]{4})-(january|february|march|april|may|june|july|august|september|october|november|december):([0-9]{1,2})-([0-9]{1,2})/([A-Za-z0-9\-\_]+)/([0-9]+)(/page-[0-9]+)?$!', 'publisher/news/1/2010-march:03-23/test_title/1/page-1', $matches); 
print_r($matches);

It produces the following:

Array
(
    [0] => publisher/news/1/2010-march:03-23/test_title/1/page-1
    [1] => news
    [2] => 1
    [3] => 2010
    [4] => march
    [5] => 03
    [6] => 23
    [7] => test_title
    [8] => 1
    [9] => /page-1
)

However as the last match is optional it can also work with matching the following "publisher/news/1/2010-march:03-23/test_title/1". My problem is that I want to be able to match (/page-[0-9]+) if it exists, but match only the page number so "publisher/news/1/2010-march:03-23/test_title/1/page-1" would match like so:

Array
(
    [0] => publisher/news/1/2010-march:03-23/test_title/1/page-1
    [1] => news
    [2] => 1
    [3] => 2010
    [4] => march
    [5] => 03
    [6] => 23
    [7] => test_title
    [8] => 1
    [9] => 1
)

I've tried the following regex

'!^publisher/([A-Za-z0-9\-\_]+)/([0-9]+)/([0-9]{4})-(january|february|march|april|may|june|july|august|september|october|november|december):([0-9]{1,2})-([0-9]{1,2})/([A-Za-z0-9\-\_]+)/([0-9]+)/?p?a?g?e?-?([0-9]+)?$!'

This works, however it will also match "publisher/news/1/2010-march:03-23/test_title/1/1". I have no idea to perform a match but not have it come back in the matches? Is it possible in a single regex?

A: 

maybe like that:

'!^publisher/([A-Za-z0-9\-\_]+)/([0-9]+)/([0-9]{4})-(january|february|march|april|may|june|july|august|september|october|november|december):([0-9]{1,2})-([0-9]{1,2})/([A-Za-z0-9\-\_]+)/([0-9]+)(/page-([0-9]+))?$!'
mathroc
No because that would then match "/page-1" and "1". I only want it to match the "1". It's being used it an automated url routing system and the regex matches are being substituted in via placeholders, so any returned matches have to match the number of placeholders.
buggedcom
+1  A: 

To absolutely not match publisher/news/1/2010-march:03-23/test_title/1/whatever

!^publisher/([A-Za-z0-9\-\_]+)/([0-9]+)/([0-9]{4})-(january|february|march|april|may|june|july|august|september|october|november|december):([0-9]{1,2})-([0-9]{1,2})/([A-Za-z0-9\-\_]+)/([0-9]+)(?:/page-([0-9]+))?$!

To still match publisher/news/1/2010-march:03-23/test_title/1/whatever but ignore the /whatever:

!^publisher/([A-Za-z0-9\-\_]+)/([0-9]+)/([0-9]{4})-(january|february|march|april|may|june|july|august|september|october|november|december):([0-9]{1,2})-([0-9]{1,2})/([A-Za-z0-9\-\_]+)/([0-9]+)(?:(?:/page-([0-9]+))|/.*)?$!
Matt Blaine
That's the ticket. Thanks. Does ?: mean match only if exists?
buggedcom
?: makes the parentheses "non-capturing". So, in the array in your example, 0 is the whole string your pattern matched. 1-9 are "captures", everything that you wrapped in () in your pattern. (?: ) groups the "/page" and the "[0-9]+" together, but it doesn't "capture" them.
Matt Blaine
Ah k cheers. Sorry I can't up vote, I've not got my +15 rep yet...
buggedcom
My explanation in my previous comment isn't the greatest, I'm sure you can find a better one. Don't worry about it, I'm glad I could help.
Matt Blaine
Doh! After all that I've realized that when reverse mapping these urls back from param parts to url that it's impossible to do correctly unless adding the whole /page- string to the param value, which would not work incase the param regex was changed by a user.
buggedcom
A: 

This is the regex what you are looking for:

^publisher/([A-Za-z0-9\-\_]+)/([0-9]+)/([0-9]{4})-(january|february|march|april|may|june|july|august|september|october|november|december):([0-9]{1,2})-([0-9]{1,2})/([A-Za-z0-9\-\_]+)/([0-9]+)/(?:page-(\d+))?

You can test it in rexexbuddy. If "page-1" is not set it will leave var 9 empty else it will set it.

RJD22
Thanks, but Matt beat you too the answer. Is there really any advantage of (\d+) over ([0-9]+)?
buggedcom
I'm not sure if there is any really difference in performance. \d is meant for digits while [0-9] is just a range like you can use [a-z] too.
RJD22