I'm writing a client which needs to read multiple consecutive small XML documents over a socket. I can assume that the encoding is always UTF-8 and that there is optionally delimiting whitespace between documents. The documents should ultimately go into DOM objects. What is the best way to accomplish this?
The essense of the problem is that the parsers expect a single document in the stream and consider the rest of the content junk. I thought that I could artificially end the document by tracking the element depth, and creating a new reader using the existing input stream. E.g. something like:
// Broken
public void parseInputStream(InputStream inputStream) throws Exception
{
XMLInputFactory factory = XMLInputFactory.newInstance();
XMLOutputFactory xof = XMLOutputFactory.newInstance();
XMLEventFactory eventFactory = XMLEventFactory.newInstance();
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
Document doc = documentBuilder.newDocument();
XMLEventWriter domWriter = xof.createXMLEventWriter(new DOMResult(doc));
XMLStreamReader xmlStreamReader = factory.createXMLStreamReader(inputStream);
XMLEventReader reader = factory.createXMLEventReader(xmlStreamReader);
int depth = 0;
while (reader.hasNext()) {
XMLEvent evt = reader.nextEvent();
domWriter.add(evt);
switch (evt.getEventType()) {
case XMLEvent.START_ELEMENT:
depth++;
break;
case XMLEvent.END_ELEMENT:
depth--;
if (depth == 0)
{
domWriter.add(eventFactory.createEndDocument());
System.out.println(doc);
reader.close();
xmlStreamReader.close();
xmlStreamReader = factory.createXMLStreamReader(inputStream);
reader = factory.createXMLEventReader(xmlStreamReader);
doc = documentBuilder.newDocument();
domWriter = xof.createXMLEventWriter(new DOMResult(doc));
domWriter.add(eventFactory.createStartDocument());
}
break;
}
}
}
However running this on input such as <a></a><b></b><c></c> prints the first document and throws an XMLStreamException. Whats the right way to do this?
Clarification: Unfortunately the protocol is fixed by the server and cannot be changed, so prepending a length or wrapping the contents would not work.