tags:

views:

64

answers:

2

I'm parsing the links found on webpages, and I'm looking for a way to convert URLs like this:

http://www.site.com/./eng/.././disclaimer/index.htm

to the equivalent and more correct

http://www.site.com/disclaimer/index.htm

mainly for avoiding duplicates.

Thank you.

A: 

Exactly what makes you think those two URL:s are equivalent?

If you can answer this question in a detailed fashion, use a regexp or parser to adhere to the rules which you know indicates that the pages are equivalent.

chelmertz
Those two URLs are equivalent because they lead to the same address.If you type the "dirty" one on the address bar, the browser automatically simplifies it.
UVL
+2  A: 

like this

function simplify($path) {
   $r = array();
   foreach(explode('/', $path) as $p) {
      if($p == '..')
        array_pop($r);
      else if($p != '.' && strlen($p))
        $r[] = $p;
   }
   $r = implode('/', $r);
   if($path[0] == '/') $r = "/$r";
   return $r;
}

and this is how you use it

$u = parse_url($dirtyUrl);
$u['path'] = simplify($u['path']);
$clean_url = "{$u['scheme']}://{$u['host']}{$u['path']}";
stereofrog
Works!Had to read your snippet ten times before understanding how it works :)Thank you so much.
UVL