views:

220

answers:

8

I'm most interested in in-process (single user) solutions for large amounts of mutating object-oriented data, where any part of the data may change. Such systems generally suffer from these problems:

  • Writing large files out from scratch is inefficient
  • xml is too verbose
  • SQL blobs aren't a good match

So how do you do it?

+1  A: 

This depends about your requirements. Would you honestly use XML or SQL blobs for high resolution pictures or audio?

I Read again your question: If you have bunch of arbitrary objects you want to store in a file image, the way to get them in/out is the copying and relocation. The out-copy can get help from the GC. The in-copy is really straightforward and mainly depends on the relocation routine.

If there would be a requirement for working with very big files, I'd provide some method into that system to mark objects 'dirty', as well as marking where they actually lie in the file image.

There would be also the need to mark in removed objects, unless if you never remove anything.

Cheery
A: 

We use mostly binary data. Unless it has to be human readable (like settings and user preferences).

If you think xml is too verbose, have a look at JSON. I think it is a very good alternative.

Gamecat
yeah, but how?! I think any ASCII format will be too verbose
Jesse Pepper
A: 

"Writing large files out from scratch is inefficient" What? Few things are as fast as file I/O. Please provide some example or data to back up your assertion that file I/O is inefficient.

Most OO systems can serialize or pickle an object to a file. This is about the fastest I/O possible.

Also, most OO systems can convert objects to standard representations like XML or JSON or YAML.

JSON/YAML is less verbose and much easier to parse than XML.

S.Lott
I generally use C++, file IO is generally the bottleneck. Imagine a 1GB file made up of a hierarchy of objects that are changing all the time. You can't just write the whole file out every time one of the objects in the hierarchy changes.
Jesse Pepper
If you're rewriting the entire structure for every change, then, perhaps you're doing it wrong. Perhaps you need a hierarchy of files so you can localize the changes?
S.Lott
hehe - hence the comment "Writing large files out from scratch is inefficient" - Jesse cleasrly knows that is a bad approach and is not doing it!
Daniel Paull
Here's the point -- file I/O is the fastest thing there is. An RDBMS, or OODBMS will be LESS efficient than raw file I/O. Writing the objects that actually changed -- with simple file I/O will be fast. Writing everything will be slower than writing deltas but faster than any database.
S.Lott
+2  A: 

OR Mapping using one of the several out of the box solutions available.

Chris
A: 

I use YAML for small-to-medium files, very easy to parse and save. JSON is a worthy alternative.

Keltia
A: 

You could try serializing to XAML, rather than XML. This can create smaller files and is much faster to read and write (serialize/deserialize).

Obviously, dependent on XAML being an option.

Matt Lacey
A: 

You need O/R mapping or an object database like db4o.

If it's a matter of a collection of relatively standalone objects, it's also possible to store each one to its own file and only write when the object is dirty. But obviously in more complex cases it can be a lot of work to keep the references straight, and to avoid unintuitve directory structures, and this is really what the O/R mappers and object dbs bring to the table.

As for XML being too verbose, that can often be solved with compression (e.g. xml in zip).

frankodwyer
A: 

For large datasets I use structured binary files, nothing is more space and time efficient.

For structured text data I would use s-expressions (I.e. LAML) or to cut down on the parenthesis LAML implemented with as in i-expressions.

Roger Nelson