I'm applying a xslt to a HTML file (already filtered and tidied to make it parseable as XML).
My code looks like this:
TransformerFactory transformerFactory = TransformerFactory.newInstance();
this.xslt = transformerFactory.newTransformer(xsltSource);
xslt.transform(sanitizedXHTML, result);
However, I receive error for every doctype found like this:
ERROR: 'Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/html4/loose.dtd'
I have no issue accessing the dtds from my browser.
I have little control over the HTML being parsed, and can't rip the DOCTYPE since I need them for entities.
Any help is welcome.
EDIT:
I tried to disable DTD validation like this:
private Source getSource(StreamSource sanitizedXHTML) throws ParsingException {
SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setNamespaceAware(false);
spf.setValidating(false); // Turn off validation
XMLReader rdr;
try {
rdr = spf.newSAXParser().getXMLReader();
} catch (SAXException e) {
throw new ParsingException(e);
} catch (ParserConfigurationException e) {
throw new ParsingException(e);
}
InputSource inputSrc = new InputSource(sanitizedXHTML.getInputStream());
return new SAXSource(rdr, inputSrc);
}
and then just calling it...
Source source = getSource(sanitizedXHTML);
xslt.transform(source, result);
The error persists.
EDIT 2:
Wrote a entity resolver, and got HTML 4.01 Transitional DTD on my local disk. However, I get this error now:
ERROR: 'The declaration for the entity "HTML.Version" must end with '>'.'
The DTD is as is, downloaded from w3.org