Our company has been for a while looking at a file format to hold a large amount of lab sensor data. Each time they run the instrumentation, it generates a file, which we consume and store in a database for trending, etc. A hierarchical format is preferred as it allows us to "group" data. This is a intermediate file format before we place the data into a database. Due to our development environment, this is our priority list:
1) .Net compliant. The API will be used in web services and a client application. We do not have any control over the customer's environment, so a pure.Net solution is best.
2) Speed of reads. Our reads are random, not sequential. The faster the better. If we were not a C# development shop I would say speed is #1.
3) File Size. If the file itself is large, a good compression ratio (86% and higher) is desired.
4) Memory footprint of the reads. Due to the volume of data, we cannot simply read it. each sensor has a time/value pair. This can generate will over 4 million pairs. This has eliminated XML for us.
We have currently looked at HDF5 and found the API is horribly lacking in the .NET arena, cannot do web services, but has size/speed we are looking for. I have looked also into JSON and it looked promising but I haven't tried reading a piece of the data back. I have searched the web and not found a lot of file formats that do what we need. Any help is appreciated.