ansaurus

Question

Answer 1

A:

Whene i Deserialize list of object larger then 1 MB xml i Deserialize les then 2 seconds with this code:

public static List<T> FromXML<T>(this string s) where T : class
        {
            var ls = new List<T>();
            var xml = new XmlSerializer(typeof(List<T>));
            var sr = new StringReader(s);
            var xmltxt = new XmlTextReader(sr);
            if (xml.CanDeserialize(xmltxt))
            {
                ls = (List<T>)xml.Deserialize(xmltxt);
            }
            return ls;
        }

Try this if is beter for XML case?

Florim Maxhuni 2009-12-20 11:53:20

The xml serialization works like that, but part of the overhead is probably the large amount of relative small objects, so object creation becomes an issue. Nevertheless XML serialization is almost never fast than binary, and much more verbose which leads to more time due to file io

MattiasK 2009-12-20 12:49:42

Answer 2

A:

Binary file can be read simultaneously by several threads. To do that it must be opened with appropriate access/share modifiers. And then each thread can get its own offset and length in that file. Thus reading in parallel is not a problem.

Let us assume that you will stick to simple binary format: each object is prefixed with its length. Knowing that you can "scroll" the file and know the offset where to put the deserializing thread.

Deserializing algoritm can look like this: 1) analyze file (divide it into several relatively large chunks, chunk border should coinside with object border) 2) spawn necessary amount of deserializer threads and "instruct" them with appropriate offset and length to read 3) combine results of all deserializer threads into one list

Vadmyst 2009-12-22 13:29:33

Hi, I working on a solution like this but rather to read from the disk in parallel I decided on retrieving the whole file to a byte buffer first and then deserialize/serialize it. Seems much faster, if you read from disk in parallel you'd be limited by the speed of the disk. I'll post some info on my solution here when I'm done, thanks

MattiasK 2009-12-23 08:35:45

Indeed, for small data sizes it is more feasible to cache it in the memory. I proposed file reading as it is more general solution. Memory caching can be added as optimization (never know how much data someone will want to deserialize :) )

Vadmyst 2009-12-23 09:39:24

Answer 3

A:

That got me thinking, it seems (correct me if I'm wrong) that the major culprit of deserialization is the actual byte conversion rather than the IO.

Don't assume where the time is being spent, get yourself a profiler and find out.

Paolo 2009-12-23 09:52:57

Answer 4

+1 A:

I do things like this quite a lot, and nothing really beats using BinaryReader to read things in. As far as I know, there is no faster way than using BinaryReader.ReadInt32 to read in a 32 bit integer.

You may also find that the overhead of making it parallel and joining back together to be too much. If you really want to go the parallel route, I would advise using multiple threads to read in multiple files, rather than multiple threads to read one file in multiple blocks.

You could also play around with the block size to make it match disk block size, but there are so many levels of abstraction in between your application and the disk that could make that a waste of time.

Nick R 2009-12-23 10:48:09

+1. I do this a lot too and BinaryReader is the way to go.

zebrabox 2009-12-23 10:51:55

ansaurus

tags:

views:

answers:

Parallel Binary DeSerialization?

related questions