I've delved into Regular Expressions for one of the first times in order to a parse a url. Without going into too much depth, I basically want friendly urls and I'm saving each permalink in the database, but because of differences in languages and pages I only want to save one permalink and parse the url for the page and language. So if I'm getting something like this:
http://domain.com/lang/fr/category/9/category_title/page/3.html
All I really want is this bit "category/9/category_title" to know what page i'm on. I've come up with this function:
$return = array();
$string = 'http://domain.com/lang/fr/category/9/category_title/page/3.html';
//Remove domain and http
$string = preg_replace('@^(?:http://)?([^/]+)@i','',$string);
if(preg_match('/^\/lang\/([a-z]{2})/',$string,$langMatches)) {
$return['lang'] = $langMatches[1];
//Remove lang
$string = preg_replace('/^\/lang\/[a-z{2}]+/','',$string);
} else {
$return['lang'] = 'en';
}
//Get extension
$bits = explode(".", strtolower($string));
$return['extension'] = end($bits);
//Remove extension
$string = preg_replace('/\.[^.]+$/','',$string);
if(preg_match('/page\/([1-9+])$/',$string,$pageMatches)) {
$return['page'] = $pageMatches[1];
//Remove lang
$string = preg_replace('/page\/[1-9+]$/','',$string);
} else {
$return['page'] = 1;
}
//Remove additional slashes from beginning and end
$string = preg_replace('#^(/?)|(/?)$#', '', $string);
$return['permalink'] = $string;
print_r($return);
Which returns this from the above example:
Array
(
[lang] => fr
[extension] => html
[page] => 3
[permalink] => category/9/category_title
)
This is perfect and exactly what I want. However my question is, have I gone about using regular expressions correctly? Is there a better way I could do this, for instance could I strip the domain, the extension and the additional slashes at the beginning and end with just one kick ass expression?