tags:

views:

3173

answers:

9

How do I use groovy to search+replace in XML?

I need something as short/easy as possible, since I'll be giving this code to the testers for their SoapUI scripting.

More specifically, how do i turn

"<root><data></data></root>"

into

"<root><data>value</data></root>"
A: 

check this: http://today.java.net/pub/a/today/2004/08/12/groovyxml.html?page=2

Bob Dizzle
I'm afraid it's a bit verbose and it only concerns reading as far as I can see. I need to search for a specific tag and insert/replace it's value
Sebastian
+1  A: 
Sebastian
A: 

Three "official" groovy ways of updating XML are described on page http://groovy.codehaus.org/Processing+XML, section "Updating XML".

Of that three it seems only DOMCategory way preserves XML comments etc.

David Skyba
A: 

I did some some testing with DOMCategory and it's almost working. I can make the replace happen, but some infopath related comments disappear. I'm using a method like this:

def rtv = { xml, tag, value ->
    def doc     = DOMBuilder.parse(new StringReader(xml))
    def root    = doc.documentElement
    use(DOMCategory) { root.'**'."$tag".each{it.value=value} }
    return DOMUtil.serialize(root)    
}

on a source like this:

<?xml version="1.0" encoding="utf-8"?>
<?mso-infoPathSolution name="urn:schemas-microsoft-com:office:infopath:FA_Ansoegning:http---ementor-dk-application-2007-06-22-" href="manifest.xsf" solutionVersion="1.0.0.14" productVersion="12.0.0" PIVersion="1.0.0.0" ?>
<?mso-application progid="InfoPath.Document" versionProgid="InfoPath.Document.2"?>
<application:FA_Ansoegning xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:application="http://corp.dk/application/2007/06/22/"
xmlns:xd="http://schemas.microsoft.com/office/infopath/2003"
xmlns:my="http://schemas.microsoft.com/office/infopath/2003/myXSD/200    8-04-14T14:31:48">
    <Mobiltlf></Mobiltlf>
  <E-mail-adresse></E-mail-adresse>
</application:FA_Ansoegning>

The only thing missing from the result are the <?mso- lines from the result. Anyone with an idea for that?

Sebastian
A: 

To me the actual copy & search & replace seems like the perfect job for an XSLT stylesheet. In an XSLT you have no problem at all to just copy everything (including the items you're having problems with) and simply insert your data where it is required. You can pass the specific value of your data in via an XSL parameter or you can dynamically modify the stylesheet itself (if you include as a string in your Groovy program). Calling this XSLT to transform your document(s) from within Groovy is very simple.

I quickly cobbled the following Groovy script together (but I have no doubts it can be written even more simple/compact):

import javax.xml.transform.TransformerFactory
import javax.xml.transform.stream.StreamResult
import javax.xml.transform.stream.StreamSource

def xml = """
<?xml version="1.0" encoding="utf-8"?>
<?mso-infoPathSolution name="urn:schemas-microsoft-com:office:infopath:FA_Ansoegning:http---ementor-dk-application-2007-06-22-" href="manifest.xsf" solutionVersion="1.0.0.14" productVersion="12.0.0" PIVersion="1.0.0.0" ?>
<?mso-application progid="InfoPath.Document" versionProgid="InfoPath.Document.2"?>
<application:FA_Ansoegning xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:application="http://ementor.dk/application/2007/06/22/"
xmlns:xd="http://schemas.microsoft.com/office/infopath/2003"
xmlns:my="http://schemas.microsoft.com/office/infopath/2003/myXSD/200    8-04-14T14:31:48">
    <Mobiltlf></Mobiltlf>
  <E-mail-adresse></E-mail-adresse>
</application:FA_Ansoegning>
""".trim()

def xslt = """
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
 <xsl:param name="mobil" select="'***dummy***'"/>
 <xsl:param name="email" select="'***dummy***'"/>

 <xsl:template match="@*|node()">
  <xsl:copy>
   <xsl:apply-templates select="@*|node()"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="Mobiltlf">
  <xsl:copy>
   <xsl:value-of select="\$mobil"/>
  </xsl:copy>
 </xsl:template>

 <xsl:template match="E-mail-adresse">
  <xsl:copy>
   <xsl:value-of select="\$email"/>
  </xsl:copy>
 </xsl:template>
</xsl:stylesheet>
""".trim()

def factory = TransformerFactory.newInstance()
def transformer = factory.newTransformer(new StreamSource(new StringReader(xslt)))

transformer.setParameter('mobil', '1234567890')
transformer.setParameter('email', '[email protected]')

transformer.transform(new StreamSource(new StringReader(xml)), new StreamResult(System.out))

Running this script produces:

<?xml version="1.0" encoding="UTF-8"?><?mso-infoPathSolution name="urn:schemas-microsoft-com:office:infopath:FA_Ansoegning:http---ementor-dk-application-2007-06-22-" href="manifest.xsf" solutionVersion="1.0.0.14" productVersion="12.0.0" PIVersion="1.0.0.0" ?>
<?mso-application progid="InfoPath.Document" versionProgid="InfoPath.Document.2"?>
<application:FA_Ansoegning xmlns:application="http://ementor.dk/application/2007/06/22/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xd="http://schemas.microsoft.com/office/infopath/2003" xmlns:my="http://schemas.microsoft.com/office/infopath/2003/myXSD/200    8-04-14T14:31:48">
    <Mobiltlf>1234567890</Mobiltlf>
  <E-mail-adresse>[email protected]</E-mail-adresse>
</application:FA_Ansoegning>
GerG
A: 

That's the best answer so far and it gives the right result, so I'm going to accept the answer :) However, it's a little too large for me. I think i had better explain that the alternative is:

xml.replace("<Mobiltlf></Mobiltlf>", <Mobiltlf>32165487</Mobiltlf>")

But that's not very xml'y so I thought i'd look for an alternative. Also, I can't be sure that the first tag is empty all the time.

Sebastian
A: 

Some of the stuff you can do with an XSLT you can also do with some form of 'search & replace'. It all depends on how complex your problem is and how 'generic' you want to implement the solution. To make your own example slightly more generic:

xml.replaceFirst("<Mobiltlf>[^<]*</Mobiltlf>", '<Mobiltlf>32165487</Mobiltlf>')

The solution you choose is up to you. In my own experience (for very simple problems) using simple string lookups is faster than using regular expressions which is again faster than using a fullblown XSLT transformation (makes sense actually).

GerG
A: 

Brilliant! Thank you very much for you assistance :)

That solves my problem in a much cleaner and easier way. It's ended up looking like this:

def rtv = { xmlSource, tagName, newValue ->
    regex = "<$tagName>[^<]*</$tagName>"
    replacement = "<$tagName>${newValue}</$tagName>"
    xmlSource = xmlSource.replaceAll(regex, replacement)
    return xmlSource
}

input = rtv( input, "Mobiltlf", "32165487" )
input = rtv( input, "E-mail-adresse", "[email protected]" )
println input

Since I'm giving this to our testers for use in their testing tool SoapUI, I've tried to "wrap" it, to make it easier for them to copy and paste.

This is good enough for my purpose, but it would be perfect if we could add one more "twist"

Let's say the input had this in it...

<Mobiltlf type="national" anotherattribute="value"></Mobiltlf>

...and we wanted to retain thos two attributes even though we replaced the value. Is there a way to use regexp for that too?

Sebastian
A: 

To retain the attributes just modify your little program like this (I've included a sample source to test it):

def input = """
<?xml version="1.0" encoding="utf-8"?>
<?mso-infoPathSolution name="urn:schemas-microsoft-com:office:infopath:FA_Ansoegning:http---ementor-dk-application-2007-06-22-" href="manifest.xsf" solutionVersion="1.0.0.14" productVersion="12.0.0" PIVersion="1.0.0.0" ?>
<?mso-application progid="InfoPath.Document" versionProgid="InfoPath.Document.2"?>
<application:FA_Ansoegning xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:application="http://ementor.dk/application/2007/06/22/"
xmlns:xd="http://schemas.microsoft.com/office/infopath/2003"
xmlns:my="http://schemas.microsoft.com/office/infopath/2003/myXSD/200    8-04-14T14:31:48">
    <Mobiltlf  type="national" anotherattribute="value"></Mobiltlf>
  <E-mail-adresse attr="whatever"></E-mail-adresse>
</application:FA_Ansoegning>
""".trim()

def rtv = { xmlSource, tagName, newValue ->
    regex = "(<$tagName[^>]*>)([^<]*)(</$tagName>)"
    replacement = "\$1${newValue}\$3"
    xmlSource = xmlSource.replaceAll(regex, replacement)
    return xmlSource
}

input = rtv( input, "Mobiltlf", "32165487" )
input = rtv( input, "E-mail-adresse", "[email protected]" )
println input

Running this script produces:

<?xml version="1.0" encoding="utf-8"?>
<?mso-infoPathSolution name="urn:schemas-microsoft-com:office:infopath:FA_Ansoegning:http---ementor-dk-application-2007-06-22-" href="manifest.xsf" solutionVersion="1.0.0.14" productVersion="12.0.0" PIVersion="1.0.0.0" ?>
<?mso-application progid="InfoPath.Document" versionProgid="InfoPath.Document.2"?>
<application:FA_Ansoegning xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:application="http://ementor.dk/application/2007/06/22/"
xmlns:xd="http://schemas.microsoft.com/office/infopath/2003"
xmlns:my="http://schemas.microsoft.com/office/infopath/2003/myXSD/200    8-04-14T14:31:48">
    <Mobiltlf  type="national" anotherattribute="value">32165487</Mobiltlf>
  <E-mail-adresse attr="whatever">[email protected]</E-mail-adresse>
</application:FA_Ansoegning>

Note that the matching regexp now contains 3 capturing groups: (1) the start tag (including attributes), (2) whatever is the 'old' content of your tag and (3) the end tag. The replacement string refers to these captured groups via the $i syntax (with backslashes to escape them in the GString). Just a tip: regular expressions are very powerful animals, it's really worthwile to become familiar with them ;-) .

GerG