views:

511

answers:

9

I'm working with a not so small tree structure (it's a Burkhard-Keller-Tree, > 100 MB in memory) implemented in C++. The pointers to the children of each node are stored in a QHash.

Each node x has n children y[1] ... y[n], the edges to the children are labeled with the edit distance d(x, y[i]), so using a hash to store the nodes is an obvious solution.

class Node {
    int value;
    QHash<int, Node*> children;
    /* ... */
};

I also want to serialize and deserialize it into a file (I currently use a QDataStream). The tree is just built once and doesn't change then.

Building the tree and deserializing it is rather slow. I'm loading the tree in the obvious way: Recursively building each node. I think this is suboptimal due to the many nodes that are created seperately with the new operator. I read somewhere that new is pretty slow. The initial build is not a big problem because the tree's rather stable and doesn't have to be rebuilt very often. But loading the tree from a file should be as fast as possible.

What's the best way to accomplish this?

It must be much better to save the whole tree in a single memory block with adjacent nodes. Serializing and deserializing would then be reduced to save and load the whole block, which I have to allocate just once.

But to implement this I would have to re-implement the QHash, AFAIK.

What would you do to speed up the deserialization?

Addendum

Thank you for your suggestion to do some profiling. Here are the results:

While rebuilding the tree from a file

 1 % of the time is consumed by my own new calls
65 % is consumed by loading the QHash objects (this is implemented by the 
     Qt Library) of each node
12 % is consumed by inserting the nodes into the existing tree
20 % is everything else

So it's definitly not my new calls which cause the delay but the rebuild of the QHash objects at every node. This is basically done with:

 QDataStream in(&infile);
 in >> node.hash;

Do I have to dig into QHash and look what's going on under the hood there? I think the best solution would be a hash object that can be serialized with a single read and write operation without the need to rebuild the internal data structure.

+1  A: 

I highly recommend the boost serialization library. It should work with the solutions you're using.

Maciek
I second this : Boost is a nice solution and handles automatically all the child/parent relation ship. It's worth investigating, given that the benchmark show that QHash (the current solution for child/parent) is what's eating up most of the time. It's also available on a wide range of platforms. On the other hand I have no idea how well Boost plays with QT.
DrYak
+1  A: 

The absolute fastest way of serialising/deserialising is writing a block of contiguous memory to disk as you say. If you changed your tree structure to create this (probably using a custom allocation routine) this would be very easy.

Unfortunately I'm not that familiar with QHash, but from looking at it it looks like a Hashtable rather than a tree. Have I misunderstood you? Are you using this to map duplicate nodes?

I'd use a profiler (I used to use Quantify, now called Rational PurifyPlus, but there are a lot listed here) to find where you are using time, but I'd guess it is either multiple memory allocations rather than a single allocation, or multiple reads rather than a single read. To solve both these problems you know in advance (because you store it) how many nodes you need, then write/read an array of nodes of the correct length, where each pointer is an index into the array, rather than a pointer in memory.

Nick Fortescue
Each node of the tree has a key and a hashtable to its leafes. Each leaf is dereferenced by an arbitrary number. To be precise: A node x has n leafes y_1 ... y_n, each edge from x to y_i is labeled with the edit distance from d(x, y_i) (see http://en.wikipedia.org/wiki/BK-tree).
Wolfgang Plaschg
+1. for profiling before optimising ...
neuro
A: 

Another solution would be to use your own memory allocator, which will use a continuous memory space. Then you'll be able to dump the memory as is and load it back. It's platform (i.e. big endian/little endian, 32bit/64bit) sensitive.

Drakosha
-1 for this idea : You mention some problems but the reality is this is also compiler, optimization level and debug/release sensitive - not to mention extending the tree in the future and handling migration nicely.
RnR
+1 to offset: With a suitable level of abstraction that's certainly possible - e.g. using iterators, and storing offsets instead of pointers. Especially for a "build once, never modify" an arena allocator is extremely efficient. Platform portability *is* a problem, and it's probably not going to solve the OP's problem, though.
peterchen
A: 

As you said, allocating objects with new might be slow. That can be improved allocating an object pool and then using pre-allocated objects until the pool is exhausted. You might even be able to implement this to work in background by overloading the new/delete operators of the class in question.

+4  A: 

First of all - profile your application so that you know what takes time - basing the suspicion on new because you've read somewhere it can be slow or on the iteration through the tree is not enough.

It's possible it's the IO operations - maybe your file format is not correct/inefficient.

Maybe you just have a defect somewhere?

Or maybe there's a quadratic loop somewhere that you don't remember about causing the problems? :)

Measure what really takes time in your case and then approach the issue - it'll save you a lot of time and you'll avoid breaking your design/code to fix performance issues that don't exist before finding the real cause.

RnR
+1. I totally agree. Always profile before optimisation. Even if your gues is right, you will know exactly how much you have gain for a given optimisation.
neuro
Each node is stored with an overloaded `<<` operator into a QDataStream. This is the recommended way to store Qt objects. No, there's no quadratic loop. I did some profiling and the results falsified my supposition (see edited question).
Wolfgang Plaschg
A: 

Your own memory allocation with an overloaded operator new() and delete() is a low-cost option (development time). However, this only affects the memory allocation time, and not the Ctor times. Your mileage may vary, but may worth a try.

Denes Tarjan
A: 

Some issues with serializing and deserializing a tree:
1. Pointers don't translate very well.
An object in memory many be at the same address when it is created again.
There may not be a method to allocate the object at the given location.

2.Serializing in the wrong order may defeat the purpose of the tree.
For example, serialize the tree using pre-order traversal. When creating the tree from the file, the tree will be in order and will not be as efficient; will look like a linked list.

You may want to serialize the objects instead of the tree structure. This will allow you to change the data structure in the future if necessary.

Thomas Matthews
A: 

I'll expand my comment a bit:

Since your profiling suggests that the QHash serialization takes the most time, I believe that replacing QHash with a QList would yield a significant improvement when it comes to deserialization speed.

The QHash serialization just outputs the key/value pairs, but the deserialization constructs a hash data structure!

Even if you said that you need the fast child lookup, I would recommend that you try replacing QHash with a QList > as a test. If there aren't many children for each node (say, less than 30), the lookup should still be fast enough even with a QList. If you find that QList is not fast enough, you could still use it just for (de)serializaton and later convert to a hash once the tree has been loaded.

rpg
+1  A: 

Another approach would be to serialize your pointers and restore them when loading. I mean:

Serializing:

nodeList = collectAllNodes();

for n in nodelist:
 write ( &n )
 writeNode( n ) //with pointers as-they-are.

Deserializing:

//read all nodes into a list.
while ( ! eof(f))
    read( prevNodeAddress)
    readNode( node )
    fixMap[prevNodeAddress] = &node;
    nodeList.append(node);

//fix pointers to new values.
for n in nodeList:
    for child in n.children:
        child->node = fixMap[child->node]

This way if you don't insert-remove new nodes you can allocate a vector once and use that memory, reducing your allocation to the maps ( as rpg said, it might be faster with lists or even vectors).

davidnr
Nice answer! Thank you
Wolfgang Plaschg