views:

583

answers:

2

Say I want to output a huge set of search results, as XML, into a PrintWriter or an OutputStream, using XOM. The resulting XML would look like this:

<?xml version="1.0" encoding="UTF-8"?>
<resultset>
    <result>
       [child elements and data]
    </result>
    ...
    ...
    [1000s of result elements more]
</resultset>

Because the resulting XML document could be big (hundreds of megabytes, perhaps), I want to output it in a streaming fashion (instead of creating the whole Document in memory and then writing that).

The granularity of outputting one <result> at a time is fine, so I want to generate one <result> after another, and write it into the stream. In other words, I'd simply like to do something like this pseudocode (automatic flushing enabled, so don't worry about that):

open stream/writer
write declaration
write start tag for <resultset>
while more results:
    write next <result> element
write end tag for <resultset> 
close stream/writer

I've been looking at Serializer, but the necessary methods, writeStartTag(Element), writeEndTag(Element), write(DocType) are protected, not public! Is there no other way than to subclass Serializer to be able to use those methods, or to manually write the start and end tags directly into the stream as Strings, bypassing XOM altogether? (The latter wouldn't be too bad in this simple example, but in the general case it would get quite ugly.)

Am I missing something or is XOM just not made for this?

With dom4j I could do this easily using XMLWriter - it has constructors that take a Writer or OutputStream, and methods writeOpen(Element), writeClose(Element), writeDocType(DocumentType) etc. Compare to XOM's Serializer where the only public write method is the one that takes a whole Document.

(This is related to my question about the best dom4j replacement where XOM is a strong contender.)

+4  A: 

As far as I know, XOM doesn't support streaming directly.

What I used when I wanted to stream my XML documents was NUX, which has Streaming XML Serializer, similar to standard Serializer class in XOM. NUX is compatible with XOM. I downloaded NUX sources, extracted few NUX classes (StreamingSerializer interface, StreamingXMLSerializer -- which works for XOM documents, StreamingVerifier and NamespacesInScope), put them into my project, and it works like a charm. Too bad this isn't directly in XOM :-(

NUX is very nice companion to XOM: http://acs.lbl.gov/software/nux/, working mirror download: nux-1.6.tar.gz

Link to API: http://acs.lbl.gov/software/nux/api/nux/xom/io/StreamingSerializer.html

Here is sample code (methods are called in order: start(), n*nextResult(), finish(), serializer is StreamingXMLSerializer from NUX):

void start() {
    serializer.writeXMLDeclaration();

    Element root = new Element("response");
    root.addAttribute(new Attribute("found", Integer.toString(123)));
    root.addAttribute(new Attribute("count", Integer.toString(542)));

    serializer.writeStartTag(root);

    serializer.flush();
}

void nextResult(Result result) {
    Element element = result.createXMLRepresentation();
    serializer.write(element);
    serializer.flush();
}

void finish() {
    serializer.writeEndTag();
    serializer.flush();
}
Peter Štibraný
Updated link for nux: http://acs.lbl.gov/software/nux/
Ed Brannin
@Ed: thanks! I'll update the answer.
Peter Štibraný
Also, here's a download link since their site is broken and possibly unmaintained: http://openbsd.mirrors.tds.net/pub/FreeBSD/distfiles/nux-1.6.tar.gz
Ed Brannin
@Ed: great! Added.
Peter Štibraný
@Ed: btw, stack overflow works as wiki, you can modify posts ... but you probably need more 'reputation' points first :-(
Peter Štibraný
+5  A: 

I ran in to the same issue, but found it's pretty simple to do what you mentioned as an option and subclass Serializer as follows:

public class StreamSerializer extends Serializer {

    public StreamSerializer(OutputStream out) {
     super(out);
    }

    @Override
    public void write(Element element) throws IOException {
     super.write(element);
    }

    @Override
    public void writeXMLDeclaration() throws IOException {
     super.writeXMLDeclaration();
    }

    @Override
    public void writeEndTag(Element element) throws IOException {
     super.writeEndTag(element);
    }

    @Override
    public void writeStartTag(Element element) throws IOException {
     super.writeStartTag(element);
    }

}

Then you can still take advantage of the various XOM config like setIdent, etc. but use it like this:

Element rootElement = new Element("resultset");
StreamSerializer serializer = new StreamSerializer(out);
serializer.setIndent(4);
serializer.writeXMLDeclaration();
serializer.writeStartTag(rootElement);
while(hasNextElement()) {
    serializer.writeElement(nextElement());
}
serializer.writeEndTag(rootElement);
serializer.flush();
Dave L.
Yeah, I was thinking it should be doable this way, but never got around trying it. Thanks for verifying this! In a way I think this is preferable to introducing yet another 3rd party library (cf. Peter's answer) just do to simple streaming. (Still, shame that XOM doesn't come with this built-in.)
Jonik
Agreed on both counts.
Dave L.
Nice. I like your solution, and will try it next time I need it.
Peter Štibraný