views:

776

answers:

4

I'm deserializing a fair amount of data through Boost.Serialization (one for each frame). However, when I output how long the deserialization takes, it varies wildly. It is not unusably slow at the moment, but it would be nice to make it faster. The data represents the same classes, arrays, maps and vectors but merely with different values.

Looking at the memory spiking as each deserialization takes place, I have to believe there's a better way of doing this than continually allocating and deallocating all this memory.

Here's a few of the read times as an example:

Deserialization - 390 milliseconds
Deserialization - 422 milliseconds
Deserialization - 422 milliseconds
Deserialization - 422 milliseconds
Deserialization - 438 milliseconds
Deserialization - 2156 milliseconds
Deserialization - 1797 milliseconds
Deserialization - 1656 milliseconds
Deserialization - 1328 milliseconds
Deserialization - 1219 milliseconds
Deserialization - 1078 milliseconds
Deserialization - 1078 milliseconds

Is there a way of writing a custom deserialization function for the same data that uses Boost.Serialization so I can specify to allocate the memory at the beginning, and then just change their values for each frame?

Update: I realised a minor issue with the optimization flags I was using was causing the serialization data to be written incorrectly, which resulted in the inconsistency of deserialization times. After fixing this, it is now consistently at 750 - 780 milliseconds each frame.

However, my original question still stands, as currently I am serializing and deserializing an entire stl container, when I really want to only serialize the contents (as the size and indexing of the container will remain exactly the same). I'm not sure of the best way to go about doing this though.

+1  A: 

Disclaimer: I do not use boost serialize.

That being said: the standard problem with these issues is more or less what you described: For any single serialization you requesting several (sometimes a lot of ) chuncks of memory from the OS, and then you free them, when you actually know more or less how much memory you'll need to begin with. The only solution I know of is to implement something similar to the Apache "pool": you allocate a block of memory and have your allocator (which you give to stl or to boost) work inside this pool; then you have classes like Pool::string which allocate memory inside the pool. Two warnings:

  • yes, this is not very c++ish, but you can put a nice envelope around it (the system architect where I work did something like this).
  • yes, you have to take care of the case where you need to alocate another pool.
David Lehavi
A: 

My first suggestion, as I cannot see your code, would be to use a profiling tool to determine the actual bottleneck(s).

The memory allocate/deallocate may well be the core problem, in which case a pool allocation - assuming you have fixed-size objects - as David mentions. would likely help.

If you are serializing a large array of objects to the regular ASCII format file, the conversion from ASCII to binary may well consume some non-trivial time. The equivalent of atoi() is likely fast, but when you say "frame" I am assuming some sort of picture or network buffer, and it is probably large/long. So the overhead of atoi() may well be significant once done millions of times.

I don't understand the distinction you are making between the contents and the objects, either. In general, there is not much overhead in collections in terms of space, there may be some in "time" with respect to hashing and tree rebalancing, but the profiler should show that as well.

In short, the part you believe is slow may well have limited bearing on the overall performance. Profile and see for sure.

sdg
+2  A: 

You might want to reconsider the design of the feature using this serialization.

From your description it seems as though you are serializing/deserializing an entire STL container very frequently. This shouldn't be required. Serialization shouldn't be used unless the data needs to be persisted so that it can be re-built later or by someone else.

If serialization is required for you application you might consider serializing each item in the container separately and then only re-serializing when an item changes. This way you won't be re-doing all of the work un-necessarily.

Ben S
+1  A: 

Boost serialization provides a templated save methods for STL collections, e.g. for a set:

template<class Archive, class Key, class Compare, class Allocator >
inline void save(
    Archive & ar,
    const std::set<Key, Compare, Allocator> &t,
    const unsigned int /* file_version */
){
    boost::serialization::stl::save_collection<
        Archive, std::set<Key, Compare, Allocator> 
    >(ar, t);
}

which just delegate to save_collection. I guess you could define a specialisation for your collection type which does the serialization in a manner of your choosing, e.g.:

namespace boost {
namespace serialization {

template<class Archive>
inline void save<MyKey, MyCompare, MyAlloc> (
    Archive & ar,
    const std::set<MyKey, MyCompare, MyAlloc> &t,
    const unsigned int /* file_version */
){
    // ...
}

}
}

You could take a copy of save_collection implementation (from collections_save_imp.hpp) as a starting point, and optimise it to fit your requirements. E.g. use a class which remembers the collection size from the previous invocation, and if it hasn't changed then re-use the same buffer.

You may need a specialisation for your member type as well. At which point it's questionable if you're getting any value from the boost serialization.

I know that's a bit vague but it's difficult to be more specific without knowing what collection type you're using, what the member type is etc.

jon hanson