views:

353

answers:

1

I am using Groovy's XmlSlurper to parse xhtml document (or sudo xhthml one), and I'm trying to get to the text nodes of the document but can't figure how, here is the code:

import groovy.util.*

xmlText = '''
<TEXTFORMAT INDENT="10" LEADING="-5">
  <P ALIGN="LEFT">
    <FONT FACE="Garamond Premr Pro" SIZE="20" COLOR="#001200" LETTERSPACING="0" KERNING="0">
      Less is more! this 
      <FONT COLOR="#FFFF00">should be all</FONT>
      the 
      <FONT COLOR="#00FF00"> words OR should some </FONT>
      OTHER WORDS will be there?
    </FONT>
  </P>
</TEXTFORMAT>
'''
records = new XmlSlurper().parseText(xmlText)
records.P.FONT.children().eachWithIndex {it, index -> println "${index} - ${it}"}

Which print the following output:

0 - should be all 
1 -  words OR should some

But I want it to print the text nodes content as well so the desired output is:

0 - Less is more! this
1 - should be all
2 - the 
3 - words OR should some
4 - OTHER WORDS will be there?

Any ideas?

+2  A: 

Looks like XmlSlurper does NOT have separate method to retrieve "Mixed Content"

There is an open item to add method supporting Mixed Content here -> Groovy JIRA

peacefulfire
yep it seems that only the XmlParser supports that option
talg