tags:

views:

606

answers:

3

Hello,

I have well-formed xml documents into string variables. I want to use preg_replace to add a defined attribute to every xml tags.

For example replace:

<tag1>
<tag2> some text </tag2>
</tag1>

by:

<tag1 attr="myAttr">
<tag2 attr="myAttr"> some text </tag2>
</tag1>

So I basically need the regex expression to find any start tags and add my attribute, but I'm a complete regex noob.

Thanks, kats

A: 
$xml_data = preg_replace("/<([^\/]+\w+)/", "<\\1 attr=\"myAttr\">", $xml_data);
mschmidt42
arrrg it's almost doing the trick, excep that this adds 'attr="myAttr">' in the CDATA part of each nodes, but not as an attribute... any idea?
katsuo11
Yes, this is *why* people recommend not mixing regexes and XML, because of the corner cases and equivalent syntaxes. But don't worry, you're only going to use it on absolutely 100% legal and consistent XML, right?
Rob
right! :)
katsuo11
+13  A: 

Don't use regular expressions for working on xml. Xml is not a regular language. Use the xml extensions of php instead:

$xml = new SimpleXml(file_get_contents($xmlFile));
function process_recursive($xmlNode) {
    $xmlNode->addAttribute('attr', 'myAttr');
    foreach ($xmlNode->children() as $childNode) {
        process_recursive($childNode);
    }
}
process_recursive($xml);
echo $xml->asXML();

All answers containing regular expressions will break this valid xml, for example:

<?xml version="1.0" encoding='UTF-8'?>
<html>
    <head>
        <!-- <meta> ... </meta> -->
        <script>//<![CDATA[
            function load() {document.write('<tt>Test</tt>');}
        //]]></script>
        <title><![CDATA[Fancy <<SiteName>> [with Breadcrumbs] > in > title]]></title>
    </head>
    <body onload="load()">
        <input
            type="submit"
            value="multiline
                   button
                   text"
        />
    </body>
</html>
soulmerge
I understand the dirtiness in using regex for xml, but in my case I'll only try to add those attributes on 'regex safe' xml doc.Thank you for pointing this out!!
katsuo11
btw I was surprised by the few code required to do it with simpleXML, I tried your code but it adds a <attributes attr="myAttr"/> element just before the document's end tag, weird
katsuo11
ok I did some minor changes in that one to work for me, using addAttribute($name,$value) instead of attributes[] and in the foreach statement $xmlNode->children() needs parenthesis.thx again!
katsuo11
Thx, fixed the code.
soulmerge
Agreed, this is a much cleaner answer, nice work.
mschmidt42
A: 

OK, for those reading these lines and are still interested about using the regex way for some reasons, here is how to do it:

$xml_data= preg_replace('/(<[A-Za-z0-9\-\_]+[^>]*)>/u','\1 attr="myAttr">',$xmlData);

Tested and approved!

But, as discussed earlier, use that one with caution! Use it only on XML source that you know won't be broken (see soulmerge post about that)

Cheers, kats

katsuo11