I would like to calculate a hash of a Python class containing a dataset for Machine Learning. The hash is meant to be used for caching, so I was thinking of md5
or sha1
.
The problem is that most of the data is stored in NumPy arrays; these do not provide a __hash__()
member. Currently I do a pickle.dumps()
for each member and calculate a hash based on these strings. However, I found the following links indicating that the same object could lead to different serialization strings:
What would be the best method to calculate a hash for a Python class containing Numpy arrays?