I'm working with a not so small tree structure (it's a Burkhard-Keller-Tree, > 100 MB in memory) implemented in C++. The pointers to the children of each node are stored in a QHash.
Each node x has n children y[1] ... y[n], the edges to the children are labeled with the edit distance d(x, y[i]), so using a hash to store the nodes is an obvious solution.
class Node {
int value;
QHash<int, Node*> children;
/* ... */
};
I also want to serialize and deserialize it into a file (I currently use a QDataStream). The tree is just built once and doesn't change then.
Building the tree and deserializing it is rather slow. I'm loading the tree in the obvious way: Recursively building each node. I think this is suboptimal due to the many nodes that are created seperately with the new
operator. I read somewhere that new
is pretty slow. The initial build is not a big problem because the tree's rather stable and doesn't have to be rebuilt very often. But loading the tree from a file should be as fast as possible.
What's the best way to accomplish this?
It must be much better to save the whole tree in a single memory block with adjacent nodes. Serializing and deserializing would then be reduced to save and load the whole block, which I have to allocate just once.
But to implement this I would have to re-implement the QHash, AFAIK.
What would you do to speed up the deserialization?
Addendum
Thank you for your suggestion to do some profiling. Here are the results:
While rebuilding the tree from a file
1 % of the time is consumed by my own new calls
65 % is consumed by loading the QHash objects (this is implemented by the
Qt Library) of each node
12 % is consumed by inserting the nodes into the existing tree
20 % is everything else
So it's definitly not my new calls which cause the delay but the rebuild of the QHash objects at every node. This is basically done with:
QDataStream in(&infile);
in >> node.hash;
Do I have to dig into QHash and look what's going on under the hood there? I think the best solution would be a hash object that can be serialized with a single read and write operation without the need to rebuild the internal data structure.