tags:

views:

189

answers:

3

Hi all,

I am trying to implement a php script which will run on every call to my site, look for a certain pattern of URL, then explode the URL and perform a redirect.

Basically I want to run this on a new CMS to catch all incoming links from the old CMS, and redirect, based on mapping, say an article id stripped form the URL to the same article ID imported into the new CMS's DB.

I can do the implementation, the redirect etc, but I am lost on the regex.

I need to catch any occurrences of:

domain.com/content/view/*/34/ or domain.com/content/view/*/30/ (where * is a wildcard) and capture * and the 30 or 34 in a variable which I will then use in a DB query.

If the following is encountered:

domain.com/content/view/*/34/1/*/

I need to capture the first * and the second *.

Be very grateful for anyone who can give me a hand on this.

A: 

It's actually very simple, a more flexible and straightforward approach is to explode() the url into an array called something like $segments, and then test on there. If you have a very small number of expected URLs, then this kind of approach is probably easier to maintain and to read.

I wouldn't recommend doing this in the htaccess file because of the performance overhead.

danp
+1  A: 

I'm not sure regular expressions are the way to go. I think it would probably be easier to use explode ('/' , $url) and check by looping over that array.

Here are the steps I would follow:

$url = parse_url($url, PHP_URL_PATH); 
$url = trim($url, '/'); 
$parts = explode ('/' , $url); 

Then you can check if

($parts[0]=='content' && $parts[1]=='view' && $parts[3]=='34')

You can also easily get the information you want with $parts[2].

Green
Thanks - how would I go about using a check loop? I know what loops are but is a check loop something specific, or do you just mean to loop through the exploded bits and check based on a numerical array?I am thinking parse_url, explode, then check loop?
Dan
The check loop was a typo. I edited my original post with more details.
Green
Thanks - I am trying a couple of different options based on execution time but that's very useful.
Dan
A: 

First, I would use the PHP function parse_url() to get the path, devoid of any protocol or hostname.

Once you have that the following code should get you the info you need.

<?php

$url = 'http://domain.com/content/view/*/34/'; // first example
$url = 'http://domain.com/content/view/*/34/1/*/'; // second example
$url_array = parse_url($url);

$path = $url_array['path'];

// Match the URL against regular expressions
if (preg_match('/content\/view\/([^\/]+)\/([0-9]+)\//i', $path, $matches)){        
        print_r($matches);
}

if (preg_match('/content\/view\/([^\/]+)\/([0-9]+)\/([0-9]+)\/([^\/]+)/i', $path, $matches)){        
        print_r($matches);
}

?>

([^\/]+) matches any sequence of characters except a forward slash

([0-9]+) matches any sequence of numbers

Though you can probably write a single regular expression to match most URL variants, consider using multiple regular expressions to check for different types of URLs. Depending on how much traffic you get, the speed hit won't be all that terrible.

Also, I recommend reading Mastering Regular Expressions by O'reilly. A good knowledge of regular expressions will come in handy quite often.

http://www.regular-expressions.info/php.html

John Kramlich
Thanks - I seem to be having problems with an unknown modifier "v" when running a preg_match using this method?
Dan
I forgot to escape the forward slashes. preg_match() considers them special characters that delimit the regular expression. Please see my updated post with code samples. It has been tested with PHP 5.3 and should be backwards compatible.
John Kramlich
Perfect - I think I can finish from here! Thanks.
Dan