views:

478

answers:

3

My XML file looks something like this:

<doc>
    <RU1>
       <conf> 
              <prop name="a" val="http://a.org/a.html&gt; 
       </conf>    
    </RU1>
    <RAU1>
     <conf> 
              <prop name="a" val="http://a.org/a.html&gt; 
       </conf>
    </RAU1>
    <RU2>
     <conf> 
              <prop name="a" val="http://a.org/a.html&gt; 
       </conf>
    </RU2>
</doc>

I want to replace "a.org" in the value of the prop field, under all parent tags which start with RU in perl, with "b.com".How do I obtain the changed as an xml file?

+4  A: 

Grab an XML parser off the CPAN and use it. They're there for a reason.

Once you've done that, it's some fairly simple XPath expressions to get the nodes you want, and then some quick text replacement on the specific attributes themselves.

Anon.
+7  A: 

Assuming that your XML is well formed (it isn't) you can use a number of CPAN modules for the job. Most of the will involve parsing the document, finding your bit with an XPath query, and printing the document out again.

Here's an example with XML::Twig. I had to fix up the XML to get it to parse.

#!/usr/bin/perl

use strict;
use warnings;

use XML::Twig;

my $twig = XML::Twig->new(
    twig_handlers => {
        'conf/prop' => sub { $_->{att}{val} =~ s/a.org/b.org/; }
    },
    pretty_print => "indented"
);
$twig->parse(join "", <DATA>);

$twig->print;


__END__
<foo>
<RU1>
   <conf>
          <prop name="a" val="http://a.org/a.html" />
   </conf>
</RU1>
<RAU1>
   <conf>
          <prop name="a" val="http://a.org/a.html" />
   </conf>
</RAU1>
<RU2>
 <conf> 
          <prop name="a" val="http://a.org/a.html" />
   </conf>
</RU2>
</foo>
Schwern
I agree with your answer but it should be noted that every manipulation involving parsing, replacement and then serialization loses stuff: entities are expanded, whitespace can be rearranged, encoding may change, etc. If you edit your XML by hand, it can be a big issue.
bortzmeyer
+2  A: 

Using the following stylesheet

<?xml version="1.0"?>

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;
  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="@*"/>
      <xsl:apply-templates/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="//*[starts-with(local-name(), 'RU')]//prop/@val">
    <xsl:call-template name="replace-aorg" />
  </xsl:template>

  <xsl:template name="replace-aorg">
    <xsl:param name="text" select="." />
    <xsl:choose>
      <xsl:when test="contains($text, 'a.org')">
        <xsl:value-of select="substring-before($text, 'a.org')"/>
        <xsl:text>b.com</xsl:text>
        <xsl:call-template name="replace-aorg">
          <xsl:with-param name="text" select="substring-after($text, 'a.org')"/>
        </xsl:call-template>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="$text"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
</xsl:stylesheet>

and adjusting your XML document to

<doc>
<RU1>
   <conf> 
          <prop name="a" val="http://a.org/a.html" /> 
   </conf>    
</RU1>
<RAU1>
 <conf> 
          <prop name="a" val="http://a.org/a.html" /> 
   </conf>
</RAU1>
<RU2>
 <conf> 
          <prop name="a" val="http://a.org/a.html" /> 
   </conf>
</RU2>
</doc>

Output:

$ xsltproc sty.xml doc.xml
<?xml version="1.0"?>
<doc>
<RU1>
   <conf>
          <prop name="a">http://b.com/a.html&lt;/prop&gt;
   </conf>
</RU1>
<RAU1>
 <conf>
          <prop name="a" val="http://a.org/a.html"/&gt;
   </conf>
</RAU1>
<RU2>
 <conf>
          <prop name="a">http://b.com/a.html&lt;/prop&gt;
   </conf>
</RU2>
</doc>

So from Perl, that would be something such as

system("xsltproc", "style.xsl", "doc.xml") == 0
  or warn "$0: xsltproc exited " . ($? >> 8);
Greg Bacon
ITS SO SIMPLE AND SUCCINCT! :P
Schwern
Don't hate the playa... :-)
Greg Bacon
At least the Perl part is simple and succinct.
mirod
XSL is for the people who think XML isn't complicated enough. :)
brian d foy