tags:

views:

155

answers:

6

Hi, I'm gettint a large amount of data from a database query and I'm making objects of them. I finally have a list of these objects (about 1M of them) and I want to serialize that to disk for later use. Problem is that it barely fits in memory and won't fit in the future, so I need some system to serialize say the first 100k, the next 100k etc; and also to read the data back in in 100k increments.

I could make some obvious code that checks if the list gets too big and then wirites it to file 'list1', then 'list2' etc but maybe there's a better way to handle this?

+2  A: 

You could go through the list, create an object, and then feed it immediately to an ObjectOutputStream which writes them to the file.

Zed
But does the OOS append to the file? And is there an equivalent way of reading the objects one by one back from the file?
kresjer
yes, there is ObjectInputStream does the reverse
Boris Pavlović
Good answer. FYI, I did something similar recently, and noticed big performance gains by wrapping my FileOutputStream in a BufferedOutputStream. In turn, you can throw a GZIP OutputStream in there too, and save some disk space:new ObjectOutputStream(new GZIPOutputStream(new BufferedOutputStream ( new FileOutputStream(myFile) ) ) );
Sam Barnum
The `ObjectOutputStream` will keep a reference to the objects you have written to it...
Tom Hawtin - tackline
@Tom: you can call reset() on the stream and it will drop all references I believe. I'm not sure if writing all objects with writeUnshared, will any object references be kept at all.
Zed
@Tom: thanks, I'll have a good look at that. All objects are independent so it shouldn't keep references.
kresjer
+1  A: 
  1. Read the objects one by one from the DB

  2. Don't put them into a list but write them into the file as you get them from the DB

Never keep more than a single object in RAM. When you read the object, terminate the reading loop when readObject() returns null (= End of file)

Aaron Digulla
+1  A: 

I guess that you checked, it's really necessary to save the data to disk. It couldn't stay in the database, could it?


To handle data that is too big, you need to make it smaller :-)

One idea is to get the data by chunks:

  • start with the request, so you don't build this huge list (because that will become a point of failure sooner or later)
  • serialize your smaller list of objects
  • then loop
KLE
+1  A: 

Think about setting the fetch size for the JDBC driver also, for example the JDBC driver for mysql defaults to fetching the whole resultset.

read here for more information: fetch size

Alexander Kjäll
A: 

It seems that you are retreiving a large dataset from db and convert them into list of objects and serialize them in a single shot.

Dont do that.. finally it may lead to application crash.

Instead you have to

  • minimize the amount of data retrieved from database. (let say 1000 records instead 1 M)
  • convert them into business object
  • And serialize them.
  • And perform the same procedure until the last record

this way you can avoid the performance problem.

Cheers

Ramesh Vel

Ramesh Vel
A: 

ObjectOutputStream will work but it has more overhead. I think DataOutputStream/DataInputStream is a better choice.

Just read/write one by one and let stream worry about buffering. For example, you can do something like this,

    DataOutputStream os = new DataOutputStream(new FileOutputStream("myfile"));
    for (...)
        os.writeInt(num);

One Gotcha with both object and data stream is that write(int) only writes one byte. Please use writeInt(int).

ZZ Coder