views:

161

answers:

3

I'm trying to serialize data that's about 30KB and I need to find a faster way to serialize/deserialze the data. For me speed is as important as size so either I find a way to compact the data more tightly or I need to have a faster mechanism to build the objects. I've tried building some custom methods for it as well as using the built in serialization methods but I'm hoping that someone out there has some experience with this.

In my app milliseconds do count so speed is fine compared with size especially since some object may be quite large.

EDIT

The data is an object with numerous properties on it including a dictionary and a number of ints and string fields. Assume a complex mesh.

So I made this for an example which gives you a bit of an idea of what the relationships in the object might look like.

<Serializable()> Class A
    Inherits B
    Dim _C As New C
    Dim E As Byte()
End Class
<Serializable()> Class B
    Dim A As Int32
    Dim B As Dictionary(Of String, Object)
End Class
<Serializable()> Class C
    Dim A As Int32
    Dim D As String
End Class

Of course there are also accessors for the fields but that shouldn't impact this.

A: 

the answer depends radically on the kind and structure of the data to be serialized

if the data is a 30K array of bytes, write the whole thing to a binary stream as a single block, it can't get much faster than that

if the data is a 30K mesh of highly structured objects embedded in a mesh of interconnections...good luck!

Steven A. Lowe
A: 

Check out this post on Marc Gravell's blog, there is an interesting benchmark

http://marcgravell.blogspot.com/2009/09/protobuf-net-vs-netdatacontractserializ.html

Anyway, you need to give more details about the structure of your data if you want a useful answer...

Thomas Levesque
+1  A: 

There are a few things that come to mind.

Is it possible to reduce the amount of data being serialized? This might be a dead end for you but it is obviously going to have a great impact on performance.

Can you reduce overall latency by streaming the serialized data? If the target of a serialized object graph is a network stream, file, etc then you may be able to overlap some two or more operations and reduce the overall latency.

Can you reduce the generality of the structure to make custom serialization cover more cases? I am looking at B::B and that it can pull in any type via the Dictionary's value. It may be that the actual types put into that Dictionary are entirely within your control but it is worth bringing up because simpler and more controlled data structures are, generally speaking, easier and faster to serialize.

Is there any redundancy in the data that you can exploit? If you knew that some of the objects contained in the dictionary were functionally equivalent then you could serialize them as a group and just reference them by index when the dictionary was being serialized.

Also, don't underestimate the effect size has on performance. Again, it depends on what the program does with the structure, but even producing a large byte stream itself may incur a time cost. Of course, sending more bytes over a network or to a file takes more time too.

I would suggest that writing minimal custom serialization code for the classes would produce a net improvement over the runtime's default serialization, even if just because you do not need to write out so much metadata. Construction of child members should be faster too.

Another technique (that may or may not help here) is making your data structure linked better for serialization. For example if you had a tree-like structure, keep sibling-to-sibling references in addition to parent-child references so that you can enumerate them all in order without the cost of recursively processing the tree. A heap also comes to mind. You can iterate over the items in the heap, without regard to how the items are individually related to one another.

Jacob O'Reilly