views:

310

answers:

5

Update There is no ready XML parser in Java community which can do NIO and XML parsing. This is the closest I found, and it's incomplete: http://wiki.fasterxml.com/AaltoHome

I have the following code:

InputStream input = ...;
XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();

XMLStreamReader streamReader = xmlInputFactory.createXMLStreamReader(input, "UTF-8");

Question is, why does the method #createXMLStreamReader() expects to have an entire XML document in the input stream? Why is it called a "stream reader", if it can't seem to process a portion of XML data? For example, if I feed:

<root>
    <child>

to it, it would tell me I'm missing the closing tags. Even before I begin iterating the stream reader itself. I suspect that I just don't know how to use a XMLStreamReader properly. I should be able to supply it with data by pieces, right? I need it because I'm processing a XML stream coming in from network socket, and don't want to load the whole source text into memory.

Thank you for help, Yuri.

+1  A: 

The stream must contain the content for an entire XML document, just not all in memory at the same time (this is what streams do). You might be able to keep the stream and the reader open to continue feeding in content; however, it would have to be part of a well-formed XML document.

Suggestion: You might want to read a bit more about how sockets and streams work before going much farther.

Hope this helps.

cjstehno
Yes, potentially the stream must contain an entire document. But why should XMLStreamReader try to validate all of it up front? It's a stream. Why can't it just go along with the data and parse whatever is available? And *if* it encounters an error, I would deal with it myself.Correct me if I'm wrong - you're saying that if I'm reading 1 gigabyte-sized XML document over a network, I should download all of it and only then XMLStreamReader would be able to iterate over it?
Yuri Ushakov
I would think that it would not validate until the whole stream has been processed (and closed). You should not have to download the whole thing, thats what streams are for. Are you writing to the stream then closing it and trying to then write more?
cjstehno
Yuri, no, Stax parsers will NOT read it completely first; you can definitely start reading right away, and parser will only block if it does not yet have any data to parse. I don't know what the issue is, but your understanding is correct.
StaxMan
A: 

Look at this link to understand more about how streaming parsers work and how does it keep you r memory foot print smaller. For incoming XML, you would need to first serialize the incoming XML and create a well formed XML, then giving it to streaming parser.

http://www.devx.com/xml/Article/34037/1954

Fazal
A: 

Which Java version are you using? With JDK 1.6.0_19, I get the behaviour you seem to be expecting. Iterating over your example XML fragment gives me three events:

  • START_ELEMENT (root)
  • CHARACTERS (whitespace between and )
  • START_ELEMENT (child)

The fourth invokation of next() throws an XMLStreamException: ParseError at [row,col]:[2,12] Message: XML document structures must start and end within the same entity.

jarnbjo
This is same as what Woodstox does as well. Question is wrong in implying otherwise.
StaxMan
+1  A: 

You can get what you want - a partial parse, but you must not close the stream when you reach the end of the current available data. Keep the stream open, and the parser will simply block when it gets to the end of the stream. When you have more data, then add it to the stream, and the parser will continue.

This arrangement requires two threads - one thread running the parser, and another fetching data. To bridge the two threads, you use a pipe - a PipeInputStream and PipeOutputStream pair that push data from the reader thread into the input stream used by the parser. (The parser is reading data from the PipeInputStream.)

mdma
A: 

If you absolutely need NIO with content "push", there are developers interested in completing API for Aalto. Parser itself is complete Stax implementation as well as alternative "push input" (feeding input instead of using InputStream). So you might instead want to check out mailing lists if you are interested. Not everyone reads StackOverflow questions. :-)

StaxMan