tags:

views:

171

answers:

1

We are evaluating the performance of HDF5 regarding chunked datasets. Especially we try to figure out if it is possible to read across different contiguous chunks and how the performance is influenced by doing so? E.g. we have a dataset with chunk size of 10, a dataset with 100 values and want to read values 23 to 48. Will there be a great loss of performance?

Many thanks!

+2  A: 

I don't know how to specifically answer your question, but I suggest you to use a chunk size of 1024 (or any higher power of two). I don't know the internals of HDF5, but from my knowledge of filesystems, and from a rough benchmark we did, 1024 was just right.

Stefano Borini
Thanks a lot for your answer! I already observed that the chunk size itself is an important performance factor. As you mentioned, a chunk size of > 1000 is a good starting point. What i try to figure out is how performance is influenced when reading contiguous data from different chunks. But i'm afraid that this is a question which cannot be concretely answered. So we will have to perform several benchmarks.
usac