tags:

views:

113

answers:

6

I have some HTML and want to replace the "src" attributes of all the img tags so that they point to copies of the identical images (although with different file names) on another host.

So for instance, given these three tags

<IMG SRC="../graphics/pumpkin.gif" ALT="pumpkin">
<IMG BORDER="5" SRC="redball.gif" ALT="*"> 
<img alt="cool image" src="http://www.crunch.com/pic.jpg"/&gt;

I would like them replaced with

<IMG SRC="http://myhost.com/cache/img001.gif" ALT="pumpkin">
<IMG BORDER="5" SRC="http://myhost.com/cache/img002.gif" ALT="*"> 
<img alt="cool image" src="http://myhost.com/cache/img003.jpg"/&gt;

I know there is some regexp magic to this, just not sure what it should look like (or if this is in fact the best way).

+4  A: 

This being asked on SO, you will most likely get a lot of answers telling you to use a parser instead. Guess what, I think it's the right answer. In PHP, you can use DOMDocument's loadHTML method to create a DOM tree from a given HTML document, which you can walk over, modifying the tags as you go along.

Jim Brissom
@Jim, he's *far* more likely to get a reference to the now-obligatory [he comes](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454)
David Thomas
@David Nah, this isn't HTML parsing, it's just plain text parsing.
Roger Willcocks
@Roger, there is, perhaps, that saving grace.
David Thomas
How is this not HTML parsing, considering he has an HTML document he wishes to parse.
webbiedave
Because what he is doing, is not using RegEx to convert text into an HTML dom. He cares nothing about dealing with closed or non-closed tags, or nesting, or indeed anything other than an <IMG followed by an SRC= (somewhere), unless he has images without an SRC attribute.
Roger Willcocks
A: 

You will need case insensitive RegEx matching, and you'll also need to consider " vs ' quotes.

Hhmm. I think I'd use a System.Text.RegularExpressions.RegEx.Replace with delegate call.

You'd need to make sure the quote matched, so you'd need an ORed check. Roughly:

\<IMG .* src\=\'.*?\' | \<IMG .* src\=\".*?\"
Roger Willcocks
A: 
mezzie
+2  A: 

I tried doing this with SimpleHTMLDOM, and it seems to work:

$html = str_get_html( ... ); // what you have done

$map = array(
  "../graphics/pumpkin.gif"       => "http://myhost.com/cache/img001.gif",
  "redball.gif"                   => "http://myhost.com/cache/img002.gif",
  "http://www.crunch.com/pic.jpg" => "http://myhost.com/cache/img003.gif",
);

foreach ($html->find("img") as $element) {
  if (isset($map[$element->src])) {
    $element->src = $map[$element->src];
  }
}

echo $html;

PS: If you need to clarify your question, you should edit your original question instead of opening a new, identical question.

Bill Karwin
Bill, it works perfectly. Thanks very much. And yes, you are quite right. I actually just wanted to close this question and start a new one without using the very dangerous word combination of "HTML" and "regex", but then I found that I couldn't close this one. Oh well. Thanks again.
njt
A: 

Just run over all images in the document and get/set the src attribute.

var images=document.getElementByTagName('img');
for(var i=0;i<images.length;i++)
{
   images[i].getAttribute("src");//do something with it
   images[i].setAttribute("src",some_new_value);//set new src
}

As many have already said, you don't need RegExp for this.

Francisc
A: 

You can use phpQuery to do this.

foreach (pq("img") as $img) {
  // insert regexp magic here
  $img->attr('src', $newurl);
}

Quite possibly overkill, but it works. Especially for people used to working with jQuery.

Andrioid