tags:

views:

188

answers:

2

I have an XML file that looks like

<?xml version='1.0' encoding='UTF-8'?>
   <root>
      <node name="foo1" value="bar1" />
      <node name="foo2" value="bar2" />
   </root>

I have a method

String processBar(String bar)

and I want to end up with

<?xml version='1.0' encoding='UTF-8'?>
   <root>
      <node name="foo1" value="processBar("bar1")" />
      <node name="foo2" value="processBar("bar2")" />
   </root>

Is there an easy way to do this? Preferably in Java. Note that the file is too large to safely load completely into memory. The data in the XML roughly arbitrary and processBar may be complex, so I don't want to use regular expressions.

A: 

you can either parse the whole thing in a java xml parser OR just get the file content into a string and then do a regexp replace on it (using i.e. http://java.sun.com/j2se/1.5.0/docs/api/java/lang/String.html#replaceAll%28java.lang.String,%20java.lang.String%29)

Niko
Go ahead, come up with a regex to parse any well-formed XML correctly, taking into account unlimited tag nesting, CDATA blocks, character and external references, PIs, comments etc...
Pavel Minaev
+4  A: 

Assuming you mean replacing the attribute values with the result of calling processBar on said attribute values...

Use the JDK's XSLT API to run the following:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" 
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:java="http://xml.apache.org/xalan/java"
                extension-element-prefixes="java">
  <xsl:template match="/root/node/@value">
    <xsl:attribute name="value">
      <xsl:value-of select="java:com.example.yourclass.processBar(string(.))"/>
    </xsl:attribute>
  </xsl:template>
</xsl:stylesheet>

This uses the Xalan-Java extensions and assumes a static method. You can get an instance of an object and store it in an xsl:variable, like this:

<xsl:variable name="frobber" select="java:com.example.Frobber.new()"/>
<xsl:value-of select="java:processBar($frobber, string(.))"/>

Or somesuch.

This only works with Xalan, but since that's the XSLT processor distributed with the JDK, I doubt it will be onerous to use Xalan.

Steven Huwig
FYI I had to fix some bugs in this after posting. The idea is sound, though. ;)
Steven Huwig
This looks great. Do you have any pointers on documentation for the <xsl:value-of select="java:"... /> syntax? In particular I'm wondering what constructor is used to instantiate yourclass, etc.
Jacob
I added some information to the answer.
Steven Huwig
And anyone who breathes "regex" gets a downvote from me.
Steven Huwig
"Note that the file is too large to safely load completely into memory" - won't Xalan load the input file completely into memory before processing? Streaming XSLT extensions are only coming in XSLT 2.1...
Pavel Minaev
To be fair, Steven, the original question did not state the complexity of the document, nor correct XML quotes, which is why regex could have been a viable solution. (That is, XSL transformers probably cannot write values without quotation marks without a good deal of trickery.)
Dave Jarvis
@Thangalin: that's why I disregarded the question as written and answered it as it was intended. ;) Regex isn't really ever a viable solution if you are calling what you are inputting "XML" and what you are outputting "XML."
Steven Huwig
@Pavel: yes, that's a good point and he did add that after I had made my initial post. If it's really too big to put in memory, then Jacob might need to use SAX or something similar.I also think Saxon-SA can do streaming XSLT transformations, but the Java extension mechanism is different and I am not familiar with it.
Steven Huwig