I want to take an XML file as input and output the same XML except for some search/replace actions for attributes and text, based on matching certain node characteristics.
What's the best general approach for this, and are there tutorials somewhere?
DOM is out since I can't guarantee being able to keep the whole thing in memory.
I don't mind using SAX or StAX, except that I want the default behavior to be a pass-through no-op filter; I did something similar with StAX once and it was a pain, didn't work with namespaces, and I was never sure if I had included all the cases I needed to handle.
I think XSLT won't work (but am not sure), because it's declarative and I need to do some procedural calculations when figuring out what text/attributes to emit on the output.
(contrived example:
Suppose I was looking for all nodes with XPath of /group/item/@number
and wanted to evaluate the number
attribute as an integer, factor it using a method public List<Integer> factorize(int i)
, convert the list of factors to a space-delimited string, and add an attribute factors
to the corresponding /group/item
node?
input:
<group name="beatles"><item name="paul" number="64"></group>
<group name="rolling stones"><item name="mick" number="19"></group>
<group name="who"><item name="roger" number="515"></group>
expected output:
<group name="beatles"><item name="paul" number="64" factors="2 2 2 2 2 2"></group>
<group name="rolling stones"><item name="mick" number="19" factors="19"></group>
<group name="who"><item name="roger" number="515" factors="103 5"></group>
)
Update: I got the StAX XMLEventReader/Writer method working easily, but it doesn't preserve certain formatting quirks that are important in my application. (I guess the program that saves/loads XML doesn't honor valid XML files. >:( argh.) Is there a way to process XML that minimizes textual differences between input and output? (at least when it comes to character data.)