tags:

views:

282

answers:

2

Hi all, I'm currently exploring HDF5. I've read the interesting comments from the thread "Evaluating HDF5" and I understand that HDF5 is a solution of choice for storing the data, but how do you query it ? For example, say I've a big file containing some identifiers : Is there a way to quickly know if a given identifier is present in the file ?

A: 

What do you mean by identifier ? If you mean an attribute, check this tutorial. In C:

  status = H5Aread (attr_id, mem_type_id, buf);
  status = H5Awrite (attr_id, mem_type_id, buf);
Brian Agnew
By identifier, I mean something like a unique name, a primary key. The example describes how to create an attribute, but how can it be used for searching ?
Pierre
+1  A: 

I think the answer is "not directly".

Here are some of the ways I think you could achieve the functionality.

Use groups:

A hierarchy of groups could be used in the form of a Radix Tree to store the data. This probably doesn't scale too well though.

Use index datasets:

HDF has a reference type which could be used to link to a main table from a separate index tables. After writing the main data, other datasets sorted on other keys with references can be used. For example:

MainDataset (sorted on identifier)
0: { A, "C", 2 }
1: { B, "B", 1 }
2: { C, "A", 3 }

StringIndex
0: { "A", Reference ("MainDataset", 2) }
1: { "B", Reference ("MainDataset", 1) }
2: { "C", Reference ("MainDataset", 0) }

IntIndex
0: { 1, Reference ("MainDataset", 1) }
1: { 2, Reference ("MainDataset", 0) }
2: { 3, Reference ("MainDataset", 2) }

In order to use the above a binary search will have to be written when looking up the field in the Index tables.

In memory Index:

Depending on the size of the dataset it may be just as easy to use an in memory index that is read/written to its own dataset using something like "boost::serialize".

HDF5-FastQuery:

This paper (and also this page) describe the use of bitmap indices to perform complex queries over a HDF dataset. I have not tried this.

Richard Corden