I want to parse with XmlSlurper a HTML document which I read using HTTPBuilder. Initialy I tried to do it this way:
def response = http.get(path: "index.php", contentType: TEXT)
def slurper = new XmlSlurper()
def xml = slurper.parse(response)
But it produces an exception:
java.io.IOException: Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
I found a workaround to provide cached DTD files. I found a simple implementation of class which should help here:
class CachedDTD {
/**
* Return DTD 'systemId' as InputSource.
* @param publicId
* @param systemId
* @return InputSource for locally cached DTD.
*/
def static entityResolver = [
resolveEntity: { publicId, systemId ->
try {
String dtd = "dtd/" + systemId.split("/").last()
Logger.getRootLogger().debug "DTD path: ${dtd}"
new org.xml.sax.InputSource(CachedDTD.class.getResourceAsStream(dtd))
} catch (e) {
//e.printStackTrace()
Logger.getRootLogger().fatal "Fatal error", e
null
}
}
] as org.xml.sax.EntityResolver
}
My package tree looks as shown below:
I modified also a little code for parsing response, so it looks like this:
def response = http.get(path: "index.php", contentType: TEXT)
def slurper = new XmlSlurper()
slurper.setEntityResolver(org.yuri.CachedDTD.entityResolver)
def xml = slurper.parse(response)
But now I'm getting java.net.MalformedURLException
. Logged DTD path from CachedDTD entityResolver is org/yuri/dtd/xhtml1-transitional.dtd
and I can't get it working...