views:

592

answers:

2

Background:
I have an old web CMS that stored content in XML files, one XML file per page. I am in the process of importing content from that CMS into a new one, and I know I'm going to need to massage the existing XML in order for the import process to work properly.

Existing XML:

<page>
    <audience1>true</audience>
    <audience2>false</audience>
    <audience3>true</audience>
    <audience4>false</audience>
    <audience5>true</audience>
</page>

Desired XML:

<page>
    <audience1>true</audience>
    <audience2>false</audience>
    <audience3>true</audience>
    <audience4>false</audience>
    <audience5>true</audience>
    <audiences>1,3,5</audiences>
</page>

Question:
The desired XML adds the node, with a comma-delimited list of the other nodes that have a "true" value. I need to achieve the desired XML for several files, so what is the best way to accomplish this? Some of my ideas:

  • Use a text editor with a regex find/replace. But what expression? I wouldn't even know where to begin.
  • Use a programming language like ASP.NET to parse the files and append the desired node. Again, not sure where to begin here as my .NET skills are only average.

Suggestions?

+1  A: 

I would probably use the XmlDocument class in .net, but that's just me because I've never been that fond of regexs.

You could then use XPath expressions to pull out the child nodes of each page node, evaluate them, and append a new node at the end of the page children, save the XmlDocument when you are done.

Xsl is an option too, but the initial learning curve is a bit painful.

There's probably a more elegant way with a regex, but if you are only running it once, it only matters that it works.

seanb
+1  A: 

I would likely use an XSLT stylesheet to solve this problem. I built the following stylesheet to be a little bit generic that exactly what you asked for, but it could easily be modified to give you the exact output you had specified if you truly need that exact output.

<?xml version="1.0" encoding="utf-8"?>

<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"&gt;

<xsl:template match="/">
  <xsl:apply-templates select="/*"/>
</xsl:template>

  <xsl:template match="/*">
    <xsl:copy>
      <xsl:copy-of select="*"/>

        <xsl:element name="nodes">
          <xsl:apply-templates select="*[normalize-space(.) = 'true']"/>
        </xsl:element>
      </xsl:copy>
  </xsl:template>

  <xsl:template match="/*/*">
    <xsl:value-of select="concat(',', local-name())"/>
  </xsl:template>

  <xsl:template match="/*/*[1]">
    <xsl:value-of select="local-name()"/>
  </xsl:template>

</xsl:stylesheet>

This XSLT's output would be:

<page>
  <audience1>
    true
  </audience1>
  <audience2>
    false
  </audience2>
  <audience3>
    true
  </audience3>
  <audience4>
    false
  </audience4>
  <audience5>
    true
  </audience5>
  <nodes>audience1,audience3,audience5</nodes>
</page>

XSLT would be a good fit for this because you can use from almost any programming language you want or you could use Visual Studio to apply the template. There are also many free tools out there that you could use to apply the transformations.

Phil Laliberte