tags:

views:

471

answers:

1

I would like to programatically change the data associated with a dataset in an HDF5 file. I can't seem to find a way to either delete a dataset by name (allowing me to add it again with the modified data) or update a dataset by name. I'm using the C API for HDF5 1.6.x but pointers towards any HDF5 API would be useful.

+2  A: 

According to the user guide (section 5.2, you'll need to scroll down some):

The size of the dataset cannot be reduced after it is created. The dataset can be expanded by extending one or more dimensions, with H5Dextend. It is not possible to contract a dataspace, or to reclaim allocated space.

HDF5 does not at this time provide a mechanism to remove a dataset from a file, or to reclaim the storage from deleted objects. Through the H5Gunlink function one can remove links to a dataset from the file structure. Once all links to a dataset have been removed, that dataset becomes inaccessible to any application and is effectively removed from the file. But this does not recover the space the dataset occupies.

The only way to recover the space is to write all the objects of the file into a new file. Any unlinked object is inaccessible to the application and will not be included in the new file.

So deleting appears to be out of the question. On the other hand modifying the dataset in place is supported.

Max Lybbert
Thanks. Any idea how PyTables (a python engine built on top of HDF5) handles this?
Barry Wark
The documentation for "altering" a table in PyTables is at http://www.pytables.org/moin/HintsForSQLUsers#Alteringatable , but note "(adding a column) is currently not supported in PyTables."
Max Lybbert