views:

117

answers:

5

I have a very large object which I wish to serialize. During the process of serialization, it comes to occupy some 130MB of heap as an weblogic.utils.io.UnsyncByteArrayOutputStream. I am using a BufferedOutputStream to speed up writing the data to disk, which reduces the amount of time for which this object is held in memory.

Is it possible to use a buffer to reduce the size of the object in memory though? It would be good if there was a way to serialize it x bytes at a time and write those bytes to disk.

Sample code follows if it is of any use. There's not much to go on though I don't think. If it's the case that there needs to be a complete in-memory copy of the object to be serialised (and therefore no concept of a serialization buffer) then I suppose I am stuck.

    ObjectOutputStream tmpSerFileObjectStream = null;
    OutputStream tmpSerFileStream = null;
    BufferedOutputStream bufferedStream = null;
    try {

        tmpSerFileStream = new FileOutputStream(tmpSerFile);
        bufferedStream = new BufferedOutputStream(tmpSerFileStream);

        tmpSerFileObjectStream = new ObjectOutputStream(bufferedStream);
        tmpSerFileObjectStream.writeObject(siteGroup);
        tmpSerFileObjectStream.flush();

    } catch (InvalidClassException invalidClassEx) {
        throw new SiteGroupRepositoryException(
                "Problem encountered with class being serialised", invalidClassEx);
    } catch (NotSerializableException notSerializableEx) {
        throw new SiteGroupRepositoryException(
                "Object to be serialized does not implement " + Serializable.class,
                notSerializableEx);
    } catch (IOException ioEx) {
        throw new SiteGroupRepositoryException(
                "Problem encountered while writing ser file", ioEx);
    } catch (Exception ex) {
        throw new SiteGroupRepositoryException(
                "Unexpected exception encountered while writing ser file", ex);
    } finally {
        if (tmpSerFileObjectStream != null) {
            try {
                tmpSerFileObjectStream.close();
                if(null!=tmpSerFileStream)tmpSerFileStream.close();
                if(null!=bufferedStream)bufferedStream.close();
            } catch (IOException ioEx) {
                logger.warn("Exception caught on trying to close ser file stream", ioEx);
            }
        }
    }
A: 

Why does it occupy all those bytes as an unsync byte array output stream? That's not how default serialization works. You must have some special code in there to make it do that. Solution: don't.

EJP
The weblogic.utils.io.UnsyncByteArrayOutputStream is visible in the heap dump, that's where I got that information from. This routine is causing intermittent out of memory errors, hence the question. I don't think there's much I can do about how Weblogic chooses to handle serialization.
Mark Chorley
A: 

It sounds like whatever runtime you are using has a less-than-ideal implementation of object serialization that you likely don't have any control over.

A similar complaint is mentioned here, although it is quite old. http://objectmix.com/weblogic/523772-outofmemoryerror-adapter.html

Can you use a newer version of weblogic? Can you reproduce this in a unit test? If so, try running it under a different JVM and see what happens.

wolfcastle
Weblogic 9.2.3, and using a newer version isn't an option in the short term unfortunately. JDK 1.5.0.16 as well if that helps.
Mark Chorley
This Java bug explains a bit how object serialization works under the covers. It offers an explanation why the entire object must be kept in a buffer. For your case, you would have to split your large object into smaller ones and call reset() on the ObjectOutputStream to help break it up. http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4363937
wolfcastle
A: 

I don't know about weblogic (that is - JRockit I suppose) serialization in particular: honestly I see no reason for using ByteArrayOutputStreams...

You may want to implement java.io.Externalizable if you need more control on how your object is serialized - or switch to an entirely different serialization system (eg: Terracotta) if you don't want to write read/write methods yourself (if you have many big classes).

giorgiga
+1  A: 

This is wrong on so many levels. This is a massive abuse of serialization. Serialization is mostly intended for temporarily storing an object. For example,

  1. session objects between tomcat server restarts.
  2. transferring objects between jvms ( load balancing at website )

Java's serialization makes no effort to handle long-term storage of objects (No versioning support) and may not handle large objects well.

For something so big, I would suggest some investigation first:

  1. Ensure that you are not trying to persist the entire JVM Heap.
  2. Look for member variables that can be labeled as 'transient' to avoid including them it the serialization ( perhaps you have references to service objects )
  3. Consider possibility that there is a memory leak and the object is excessively large.

If everything is indeed correct, you will have to research alternatives to java.io.Serialization. Taking more control via java.io.Externalization might work. But I would suggest something like a json or xml representation.

Update:

Investigate :

  1. google's protocol buffer
  2. facebook's Thrift
  3. Avro
  4. Cisco's Etch

Take a look at this benchmarkings as well.

Pat
All three of those points have been considered and eliminated. Reducing the size of the object in memory is not straightforward - it contains something like 500,000 content items cached from a CMS with minimal data stored relating to each. It's persisted to disk, as you deduced, to speed up application restarts. I like the sound of an alternative representation as that would reduce the size on disk. But would it make any difference to the memory occupied during serialization?
Mark Chorley
The problem with cross-restart speedups is that presumedly you are are only restarting a server because there is a new version of the code. A new version means that the serialized old version cannot be reliably deserialized. So effectively nothing is gained.
Pat
A: 

What is the "siteGroup" object that you're trying to save? I ask, because it's unlikely that any one object is 130MB in size, unless it has a ginormous list/array/map/whatever in it -- and if that's the case, the answer would be to persist that data in a database.

But if there's no monster collection in the object, then the problem is likely that the object tree contains references to a bagillion objects, and the serialization of course does a deep copy (this fact has been used as a shortcut to implement clone() a lot of times), so everything gets cataloged all at once in a top-down fashion.

If that's the problem, then the solution would be to implement your own serialization scheme where each object gets serialized in a bottom-up fashion, possibly in multiple files, and only references are maintained to other objects, instead of the whole thing. This would allow you to write each object out individually, which would have the effect you're looking for: smaller memory footprint due to writing the data out in chunks.

However, implementing your own serialization, like implementing a clone() method, is not all that easy. So it's a cost/benefit thing.

Gabriel
It does contain an enormous map of objects which are indeed persisted in a database. The serialisation is just for speed of startup as building the map from the database takes hours as opposed to less than a minute from disk.
Mark Chorley
Ah. That might actually make things easier... can you just go through the map and serialize all those objects separately, then clear the map and serialize the main siteGroup object? On startup, you could then reverse the process: build the map, then put it in the object. That may keep the maximum heap size down quite a bit.
Gabriel