What file format do you use for your application and why?

views:

220

answers:

What file format do you use for your application and why?

I'm most interested in in-process (single user) solutions for large amounts of mutating object-oriented data, where any part of the data may change. Such systems generally suffer from these problems:

Writing large files out from scratch is inefficient
xml is too verbose
SQL blobs aren't a good match

So how do you do it?

+1 A:

This depends about your requirements. Would you honestly use XML or SQL blobs for high resolution pictures or audio?

I Read again your question: If you have bunch of arbitrary objects you want to store in a file image, the way to get them in/out is the copying and relocation. The out-copy can get help from the GC. The in-copy is really straightforward and mainly depends on the relocation routine.

If there would be a requirement for working with very big files, I'd provide some method into that system to mark objects 'dirty', as well as marking where they actually lie in the file image.

There would be also the need to mark in removed objects, unless if you never remove anything.

Cheery 2009-01-14 13:35:43

We use mostly binary data. Unless it has to be human readable (like settings and user preferences).

If you think xml is too verbose, have a look at JSON. I think it is a very good alternative.

Gamecat 2009-01-14 13:37:16

yeah, but how?! I think any ASCII format will be too verbose

Jesse Pepper 2009-01-14 14:13:07

"Writing large files out from scratch is inefficient" What? Few things are as fast as file I/O. Please provide some example or data to back up your assertion that file I/O is inefficient.

Most OO systems can serialize or pickle an object to a file. This is about the fastest I/O possible.

Also, most OO systems can convert objects to standard representations like XML or JSON or YAML.

JSON/YAML is less verbose and much easier to parse than XML.

S.Lott 2009-01-14 13:38:25

I generally use C++, file IO is generally the bottleneck. Imagine a 1GB file made up of a hierarchy of objects that are changing all the time. You can't just write the whole file out every time one of the objects in the hierarchy changes.

Jesse Pepper 2009-01-14 13:53:15

If you're rewriting the entire structure for every change, then, perhaps you're doing it wrong. Perhaps you need a hierarchy of files so you can localize the changes?

S.Lott 2009-01-14 14:04:58

hehe - hence the comment "Writing large files out from scratch is inefficient" - Jesse cleasrly knows that is a bad approach and is not doing it!

Daniel Paull 2009-01-22 00:13:09

Here's the point -- file I/O is the fastest thing there is. An RDBMS, or OODBMS will be LESS efficient than raw file I/O. Writing the objects that actually changed -- with simple file I/O will be fast. Writing everything will be slower than writing deltas but faster than any database.

S.Lott 2009-01-22 11:09:06

+2 A:

OR Mapping using one of the several out of the box solutions available.

Chris 2009-01-14 13:42:10

I use YAML for small-to-medium files, very easy to parse and save. JSON is a worthy alternative.

Keltia 2009-01-14 13:44:05

You could try serializing to XAML, rather than XML. This can create smaller files and is much faster to read and write (serialize/deserialize).

Obviously, dependent on XAML being an option.

Matt Lacey 2009-01-14 13:53:50

You need O/R mapping or an object database like db4o.

If it's a matter of a collection of relatively standalone objects, it's also possible to store each one to its own file and only write when the object is dirty. But obviously in more complex cases it can be a lot of work to keep the references straight, and to avoid unintuitve directory structures, and this is really what the O/R mappers and object dbs bring to the table.

As for XML being too verbose, that can often be solved with compression (e.g. xml in zip).

frankodwyer 2009-01-14 14:07:46

For large datasets I use structured binary files, nothing is more space and time efficient.

For structured text data I would use s-expressions (I.e. LAML) or to cut down on the parenthesis LAML implemented with as in i-expressions.

Roger Nelson 2009-01-15 05:51:54

ansaurus

tags:

views:

answers:

What file format do you use for your application and why?

related questions