views:

74

answers:

3

hi everyone.

for example i've got a string like this:

$html = '
            <a href="test.html">test</a>
            <a href="http://mydomain.com/test.html"&gt;test&lt;/a&gt;
            <a href="http://otherdomain.com/test.html"&gt;test&lt;/a&gt;
            <a href="someothertest/otherdir/hi.html">hi</a>
        ';

and i want to append the absolute url to all hrefs where no abolute domain is given.

$html = '
            <a href="http://mydomain.com/test.html"&gt;test&lt;/a&gt;
            <a href="http://mydomain.com/test.html"&gt;test&lt;/a&gt;
            <a href="http://otherdomain.com/test.html"&gt;test&lt;/a&gt;
            <a href="http://mydomain.com/someothertest/otherdir/hi.html"&gt;hi&lt;/a&gt;
        ';  

whats the best way to do that? i guess something with RegEx, but my RegEx skills are ** ;)

thanks in advance!

A: 
$domain = 'http://mydomain';
preg_match_all('/href\="(.*?)"/im', $html, $matches);
foreach($matches[1] as $n=>$link)
{
    if($substr($link, 0, 4) != 'http')
        $html = str_replace($matches[1][$n], $domain . $matches[1][$n], $html);
}
Romka
Romka, I formatted your code for you so our eyes don't bleed when we read it.
John Conde
+1  A: 

The previous answer will cause problems with your first and fourth example because it fails to include a forward slash to separate the page from the page name. Admittedly this can be fixed by simply appending it to the $domain, but if you do that then href="/something.php" will end up with two.

Just to give an alternative Regex solution you could go with something like this...

$pattern = '#'#(?<=href=")(.+?)(?=")#'';
$output = preg_replace_callback($pattern, 'make_absolute', $input);

function make_absolute($link) {
    $domain = 'http://domain.com';
    if(strpos($link[1], 'http')!==0) {
        if(strpos($link[1], '/')!==0) {
            return $domain.'/'.$link[1];
        } else {
            return $domain.$link[1];
        }
    }
    return $link[1];
}

However it is worth noting that with a link such as href="example.html" the link is relative to the current directory neither method shown so far will work correctly for relative links that aren't in the root directory. In order to provide a solution that is though more information would be required about where the information came from.

Cags
A: 

found a good way :

$html = preg_replace("#(<\s*a\s+[^>]*href\s*=\s*[\"'])(?!http)([^\"'>]+)([\"'>]+)#", '$1http://mydomain.com/$2$3', $html);

you can use (?!http|mailto) if you have also mailto links in your $html

choise