views:

277

answers:

6

My application need to store large amounts of XML-like hierarchical information with the following requirements:

  1. Fast to read
  2. Minimal memory consumption
  3. Typed data instead of merely text

Any suggestions for a binary format that fulfills these goals?

A: 

Do other applications need to read the stored data, or just yours? Does it need to be a "standard" format?

Fast Infoset meets requirements (1) and (2), although because it's just a binary representation of the XML information model, it's just as untyped as XML. Might be good enough for your purposes, though, in the absence of anything else.

skaffman
A: 

There's too little detail in your requirements to give good suggestions. For example are you free to pick your storage medium? Will it be a file system, database or something else?

What does "minimum memory consumption" mean? Are you running on a constrained platform? Must you share resources with other applications? Is a 1GB footprint small enough if your computer has 4GB of memory? Will your data sit in memory or only the parts you are working on?

If the platform was Java, I'd start with its standard serialization and then investigate custom serialization if I wasn't happy with the performance.

hbunny
A: 

If the format is discussable, I'd suggest JSON, not XML. JSON is actually faster to load and write than standard XML.

More about JSON :

http://www.25hoursaday.com/weblog/PermaLink.aspx?guid=060ca7c3-b03f-41aa-937b-c8cba5b7f986 http://www.25hoursaday.com/weblog/PermaLink.aspx?guid=39842a17-781a-45c8-ade5-58286909226b

yoda
Firstly, JSON is not a substitute for XML, it can't represent structures of the same complexity. Secondly, that's quite a performance claim, one which I'd like to see backed up with evidence.
skaffman
Id' like to know more about the "structures with the same complexity" that JSOn can't handle as well.
yoda
XML attributes, for one, XML namespaces for another. JSON is just a simple nested-key-value map.
skaffman
JSON is built for data structures (better than XML) .. Calling it "just a simple nested-key-value map" is like calling XML a "popular" way of writting semantic code - doesn't makes sense and ain't honest about its benefits. JSON doesn't have namespaces for now, that's true and due to the divised community when questioned about that implementation. About the XML attributes, if you could be more specific, I'd appreciate.
yoda
+1 for suggesting something other than xml. namespaces are overrated for data storage, and attributes don't add any real benefits in data complexity. anything you can do with attributes you can do with a tag. it just makes it more compact in the markup.
Jeremy Wall
+1  A: 

You could also read the XML into an object graph and store as Google Protocol Buffers. These are designed to be very efficient.

Fortyrunner
+1  A: 

you don't specify if xml is a format requirement you only say it needs to be hierarchical like xml.

Without more detail on the kind of data it's hard to give you very much advice. So here's a small list.

  • b-trees there are a number of libraries supporting b-tree storage formats in mulitiple languages. they have fast lookups and are hierarchical in nature.
  • Protocol-Buffers from google. Compact storage optimized for sending over the wire. Not neccessarily optimized as a storage format though. They are typed though and probably will do pretty well as a storage format.
  • Zipped text formats. compact, and depending on the format chosen typed and hierarchical in nature.
    • YAML (supporting for some complex typing, hierarchical, human readable)
    • JSON (less typing support, fast parsing, hierarchical, human readable)
Jeremy Wall
A: 

Wikipedia's explanation of the issue: http://en.wikipedia.org/wiki/Binary%5FXML

Supposedly the recommended organisation and its java and .net sdk can be downloaded from: http://www.agiledelta.com/product%5Fefx.html

xml is pure text but can be used to represent serialized objects. Let's presume your serializer is serializing your objects into xml.

You should not try to convert your objects into binary streams because you would have to tackle endian (http://en.wikipedia.org/wiki/Endian) and data-representation issues. However, if you insist, you would need to use XDR (http://en.wikipedia.org/wiki/External%5FData%5FRepresentation) for its data architecture neutrality.

Otherwise, you should serialize your objects to XML using standard serializers and then convert the xml to binary/compact xml because of the availability of libraries and sdks. And then deserialize by decompacting from binary xml.

Blessed Geek