ansaurus

Question

RegEx for replacing and adding attributes to an HTML tag

Answer 1

+1 A:

With appropriate escaping (that I can never remember without trial and error), and something to increment the img_number, you want to replace something like this:

(<img .*?)(?:id=".*")?(.*?/>)

with something like this this:

\1 id="img_$i"\2

Sparr 2009-03-27 01:25:57

(<img .*?)(id=".*")?(.*?/>) would work better I think...

David Zaslavsky 2009-03-27 01:29:52

Not sure if you wrote that before I fixed the syntax... the ?: makes the middle group non-capturing, which speeds regex execution on fast platforms.

Sparr 2009-03-27 01:35:23

Answer 2

+1 A:

I think the best approach is to use preg_replace_callback.

Also I would recommend a slightly more stringent regexp than those suggested so far - what if your page contains an <img /> tag that does not contain an id attribute?

$page = '
<body>
  <img src="source.jpg" />
  <p>
 <img src="source.jpg" id ="hello" alt="nothing" />
 <img src="source.jpg" id ="world"/>
  </p>
</body>';

function my_callback($matches)
{
 static $i = 0;
 return $matches[1]."img_".$i++;
}

print preg_replace_callback('/(<img[^>]*id\s*=\s*")([^"]*)/', "my_callback", $page);

Which produces the following for me:

<body>
  <img src="source.jpg" />
  <p>
 <img src="source.jpg" id ="img_0" alt="nothing" />
 <img src="source.jpg" id ="img_1"/>
  </p>
</body>

The regexp has two capturing groups, the first we preserve, the second we replace. I've used lots of negative character classes (e.g. [^>]* = up to closing >) to make sure that <img /> tags arn't required to have id attributes.

RobM 2009-03-27 12:17:31

Answer 3

+3 A:

<?php
$data = <<<DATA
<body>
  <img src="source.jpg" />
  <p>
    <img src="source.jpg" id ="hello" alt="nothing" />
    <img src="source.jpg" id ="world"/>
  </p>
</body>
DATA;

$doc = new DOMDocument('1.0', 'UTF-8');
$doc->strictErrorChecking = true;
$doc->standalone = true;
$doc->xmlStandalone = true;
$doc->formatOutput = true;
$doc->loadXML($data, LIBXML_NOWARNING | LIBXML_NOERROR);

$sNode = $doc->getElementsByTagName("img");

$id = 0;
foreach($sNode as $searchNode)
{
  $searchNode->setAttribute('id', "img_$id");
  $doc->importNode($searchNode);
  $id++;
}

$result = $doc->saveHTML();
echo $result;

raspi 2009-03-28 05:16:36

+1 for actually showing a non-regex solution

Daniel Vandersluis 2010-07-29 18:53:39

ansaurus

tags:

views:

answers:

RegEx for replacing and adding attributes to an HTML tag

related questions