ansaurus

Question

PHP & Regex : Adding website url to images

Answer 1

+4 A:

For a start you could stop using regular expressions to process HTML, particularly when what you're doing is so easily done with an HTML parser (of which PHP has at least 3). For example:

$dom = new DomDocoument;
$dom->loadHTML($html);
$images = $dom->getElementsByTagName('img');
foreach ($images as $image) {
  $src = $image->getAttribute('src');
  $url = parse_url($src);
  $image->setAttribute('src', http_build_url('http://www.mydomain.com', $url);
}
$html = $dom->saveHTML();

Problem solved. Well, almost. The case where you add the hostname to relative URLs but not to those beginning with / is a little puzzling and not handled in this snippet but it's a relatively minor change (it involves checking $url['path']).

See Parse HTML With PHP And DOM, the Document Object Model, parse_url() and http_build_url(). PHP has much better tools for this than regular expressions.

Oh and for good measure read Parsing Html The Cthulhu Way.

cletus 2009-11-30 08:06:24

Answer 2

A:

Trying to match HTML with regular expressions is very difficult.

Even though your code may seem to work, there is a good chance that some IMG tags will slip through as they are not in the exact format you have described.

Jon Winstanley 2009-11-30 08:08:23

Answer 3

A:

This isn't tested, but I'm thinking something like this...

preg_match_all('/<img\b[^>]*\bsrc\s*=\s*[\'"]?([^\'">]*)/i', $content_text, $matches);

Matt Huggins 2009-11-30 08:11:37

Answer 4

+4 A:

Maybe a completely different approach may work, too:

<base href="http://domain.com/" />

Martin 2009-11-30 08:11:49

Oh man. I never knew about this tag. Thanks for posting a reference to it.

Platinum Azure 2009-11-30 08:22:29

Answer 5

A:

Now, all the cool kids are going to tell you not to use regex to parse html. This is mostly because of HTML's tree context. While I usually agree with the cool kids, a simple replace like what you're doing is perfectly fine for regex. In fact I would consider it a waste of resources to bother throwing DomDocument (or any other parser) at this problem.

Here's an easy one-liner for what you want:

preg_replace('/(<img[^>]*)src="([^\/])([^"]*")/', '$1src="http://domain.com/$2$3', $input);

Matt 2009-11-30 08:20:44

ansaurus

tags:

views:

answers:

PHP & Regex : Adding website url to images

related questions