views:

48

answers:

3

hello, i am writting an small crawler that extract some 5 to 10 sites while getting the links i am getting some urls like this

../tets/index.html

if it is /test/index.html we can add with base url http://www.example.com/test/index.html

what can i do for this kind of urls.

+1  A: 

Url like these are relative urls . ".." means "parent directory", whereas "." simply means "this directory", as in bash. For instance, if you are looking at this page : http://www.someserver/test/foo/bar.html , and there is an url like this in it : "../baz/foobar.html", it will in fact point to http://www.someserver/test/baz/foobar.html I think. Just test.

greg0ire
A: 

Use dirname() to get base directoy, remove the .. using substr() and append it there. Like this:

<?php
$url = "../tets/index.html";
$currentURL = "http://example.com/somedir/anotherdir";
echo dirname($currentURL).substr($url, 2);
?>

This outputs:

http://example.com/somedir/tets/index.html

shamittomar
The question is tagged with c++, so I don't think php code is relevant...
greg0ire
@greg0ire, the question is also tagged with `PHP`. Please take a look at the tags.
shamittomar
@shamittomar: Oops, true! This is strange... upvoting bjskishore123's comment
greg0ire
A: 

Take a look into this URL Normalization Wikipedia page.

Alix Axel