ansaurus

Question

Using regex to fix xml content

Answer 1

A:

I'd have thought that xslt is your best bet for this sort of thing.

Tom 2009-04-28 16:35:23

Answer 2

+1 A:

See XMLStarlet. It's a command line tool set for read/manipulating XML.

In particular, the xml ed command is probably what you want. You can specify XPaths of what you want to change, and how to change it. It'll respect the specified XML character encoding etc., which your standard command-line tools will not.

Brian Agnew 2009-04-28 17:19:41

Thanks. This looks like a good way forward without having to deal with the complexities of xslt.

IanGilham 2009-04-28 17:56:37

Answer 3

+1 A:

I don't know if the complexities of XML Starlet are any less than the complexities of XSLT - most of the complexity is actually in the XPath that you're going to use to find the nodes that you're going to change.

If you were to use XSLT, you'd simply create an identity transform and then add a template to change the text nodes you're interested in:

<xsl:template match="prop[@type='Att::Status']/text()">
   <xsl:choose>
      <xsl:when test=". = 'New'">Validated</xsl:when>
      <xsl:when test=". = 'Approved'">Not Validated</xsl:when>
      <xsl:otherwise>
         <xsl:copy/>
      </xsl:otherwise>
   </xsl:choose>
</xsl:template>

Or you could go nuts and specify the mapping in an external XML file, e.g.:

<map>
   <text value="New">Validated</text>
   <text value="Approved">Not Validated</text>
</map>

Then, in your XSLT:

<xsl:variable name="map" select="document('map.xml')/map/text"/>

<xsl:template match="prop[@type='Att::Status']/text()">
   <xsl:choose>
      <xsl:when test="$map[@value=current()]">
         <xsl:copy-of select="$map[@value=current()]/text()"/>
      </xsl:when>
      <xsl:otherwise>
         <xsl:copy/>
      </xsl:otherwise>
   </xsl:choose>
</xsl:template>

Robert Rossney 2009-04-29 19:06:45

Seems fairly straight forward, but the language is just so ugly. At least XPath is relatively succinct and legible.+1 for a nice example

IanGilham 2009-04-30 10:14:38

I think the language is quite elegant, myself, but I probably have Stockholm syndrome.

Robert Rossney 2009-04-30 17:49:12

ansaurus

tags:

views:

answers:

Using regex to fix xml content

related questions