views:

36

answers:

2

hi! i want to parse all links in html document string in php in such way: replace href='LINK' to href='MY_DOMAIN?URL=LINK', so because LINK will be url parameter it must be urlencoded. i'm trying to do so:

preg_replace('/href="(.+)"/', 'href="http://'.$host.'/?url='.urlencode('${1}').'"', $html);

but '${1}' is just string literal, not founded in preg url, what need i do, to make this code working?

A: 

Use the 'e' modifier.

preg_replace('/href="([^"]+)"/e',"'href=\"http://'.$host.'?url='.urlencode('\\1').'\"'",$html);

http://uk.php.net/preg-replace - example #4

Pete
+2  A: 

Well, to answer your question, you have two choices with Regex.

You can use the e modifier to the regex, which tells preg_replace that the replacement is php code and should be executed. This is typically seen as not great, since it's really no better than eval...

preg_replace($regex, "'href=\"http://{$host}?url='.urlencode('\\1').'\"'", $html);

The other option (which is better IMHO) is to use preg_replace_callback:

$callback = function ($match) use ($host) {
    return 'href="http://'.$host.'?url='.urlencode($match[1]).'"';
};
preg_replace_callback($regex, $callback, $html);

But also never forget, don't parse HTML with regex...

So in practice, the better way of doing it (The more robust way), would be:

$dom = new DomDocument();
$dom->loadHtml($html);
$aTags = $dom->getElementsByTagName('a');
foreach ($aTags as $aElement) {
    $href = $aElement->getAttribute('href');
    $href = 'http://'.$host.'?url='.urlencode($href);
    $aElement->setAttribute('href', $href);
}
$html = $dom->saveHtml();
ircmaxell
just $aElement->setAttribute($href); must be replaced on $aElement->setAttribute('href', $href);
hippout
Whoops, thanks for noticing that...
ircmaxell