views:

225

answers:

4

We serialize/deserialize XML using XStream... and just got an OutOfMemory exception.

Firstly I don't understand why we're getting the error as we have 500MB allocated to the server.

Question is - what changes should we make to stay out of trouble? We want to ensure this implementation scales.

Currently we have ~60K objects, each ~50 bytes. We load the 60K POJO's in memory, and serialize them to a String which we send to a web service using HttpClient. When receiving, we get the entire String, then convert to POJO's. The XML/object hierarchy is like:

<root>
    <meta>
       <date>10/10/2009</date>
       <type>abc</type>
    </meta>

    <data>
        <field>x</field>
    </data>

    [thousands of <data>]
</root>

I gather the best approach is to not store the POJO's in memory and not write the contents to a single String. Instead we should write the individual <data> POJO's to a stream. XStream supports this but seems like the <meta> element wouldn't be supported. Data would need to be in form:

<root> 
    <data>
        <field>x</field>
    </data>

    [thousands of <data>]
</root>

So what approach is easiest to stream the entire tree?

A: 

I'd suggest using tools like Visual VM or Eclipse Memory Analyzer to make sure you don't have a memory leak/problem.

Also, how do you know each object is 50 bytes? That doesn't sound likely.

matt b
+2  A: 

Not sure what the problem is here...you've found your answer on that webpage.

The example code on the link you provided suggests:

Writer someWriter = new FileWriter("filename.xml");

ObjectOutputStream out = xstream.createObjectOutputStream(someWriter, "root");
out.writeObject(dataObject);
// iterate over your objects...
out.close();

and for reading nearly identical but with Reader for Writer and Input for Output:

Reader someReader = new FileReader("filename.xml");

ObjectInputStream in = xstream.createObjectInputStream(someReader);
DataObject foo = (DataObject)in.readObject();
// do some stuff here while there's more objects...
in.close();
Mark E
+3  A: 

You definitely want to avoid serializing your POJOs into a humongous String and then writing that String out. Use the XStream APIs to serialize the POJOs directly to your OutputStream. I ran into the same situation earlier this year when I found that I was generating 200-300Mb XML documents and getting OutOfMemoryErrors. It was very easy to make the switch.

And ditto of course for the reading side. Don't read the XML into a String and ask XStream to deserialize from that String: deserialize directly from the InputStream.

You mention a second issue regarding not being able to serialize the <meta> element and the <data> elements. I don't think this is an XStream problem or limitation as I routinely serialize much more complex structures on the order of:

<myobject>
    <item>foo</item>
    <anotheritem>foo</anotheritem>
    <alist>
        <alistitem>
            <value1>v1</value1>
            <value2>v2</value2>
            <value3>v3</value3>
            ...
        </alistitem>
        ...
        <alistitem>
            <value1>v1</value1>
            <value2>v2</value2>
            <value3>v3</value3>
            ...
        </alistitem>
    </alist>
    <anotherlist>
        <anotherlistitem>
            <valA>A</valA>
            <valB>B</valB>
            <valC>C</valC>
            ...
        </anotherlistitem>
        ...
    </anotherlist>
</myobject>

I've successfully serialized and deserialized nested lists too.

Jim Ferrans
A: 

Use XMLStreamWriter (or XStream) to serialize it, you can write whatever you want on it. If you have the option of getting the input stream instead of the entire string, use a SAXParser, it is event based and, although the implementation maybe a little bit clumsy, you will be able to read any XML that is thrown at you, even if it the XML is huge (I have parse 2GB+ more XML files with SAXParser).

Just as a side note, you should send the binary data and not the string to a XML parser. XML parsers will read the encoding of the byte array that is going to come next through the xml tag in the beginning of the XML sequence:

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>

A string is encoded in something already. It's better practice to let the XML parse the original stream before you create a String with it.

Ravi Wallau
Even better is to use a Stax parser (or xpp, its predecessor); incremental (streaming) parsing, but bit less cumbersome to use. And better yet, Xstream already allows one to use it.
StaxMan