I have a solution where I need to read objects into memory very quickly, however the binary stream might be cached compressed in memory to save time on disk io.
I've tinkered around with different solutions, obviously XmlTextWriter and XmlTextReader wasnt so good and neither was the built-in binary serialization. Protobuf-net is excellent but still a little bit too slow. Here are some stats:
File Size XML: 217 kb
File Size Binary: 87 kb
Compressed Binary: 26 KB
Compressed XML: 26 KB
Deserialize with XML (XmlTextReader) : 8.4 sek
Deserialize with Binary (Protobuf-net): 6.2 sek
Deserialize with Binary wo string.interning (Protobuf-net): 5.2 sek
Deserialize with Binary From memory: 5.9 Sek
Time to decompress binary file into memory: 1.8 sek
Serialize With Xml (XmlTextWriter) : 11 sek
Serialize With Binary (Protobuf): 4 sek
Serialize With Binary length prefix (Protobuf-net): 3.8 sek
That got me thinking, it seems (correct me if I'm wrong) that the major culprit of deserialization is the actual byte conversion rather than the IO. If thats' the case then it should be a candidate for using the new Parallel extensions.
Since I'm bit of a novice when it comes to binary IO I'd appreciate some input before I commit time to solution though :)
For simplicity sake, say we want to deserialize a list of objects with no optional field. My first idea was simply to store each with a length prefix. Read the byte[] of each into a list of byte[] and use PLINQ to do the byte[] -> object deserialization.
However with that method I still need to read the byte[] singlethreadedly, so perhaps one could read the whole binary stream into memory instead (how large binary files are feasible for that btw?) and in the beginning of the binary file instead store how many objects there are and each of their length and offset. Then I should be able to just create ArraySegments or something and do the chunking in paralllel too.
So what do you guys think , is it feasible?