views:

167

answers:

4

Why is binary serialization considered faster than xml serialization?

+3  A: 

Binary serialization is more efficient because write raw data directly and the XML needs format, and parse the data to generate a valid XML structure, additionally depending of what sort of data have your objects the XML may have a lot of redundant data.

RRUZ
+3  A: 

Consider serializing double for example:

  • binary serialization: writing 8 bytes from memory address to the stream

  • binary deserialization: reading same 8 bytes

  • xml serialization: writing tag, converting to text, writing closing tag - nearly thrice the I/O and 1000x more CPU utilization

  • xml deserialization: tag reading/validation, reading string parsing it to number, reading/validation of closing tag. little more overhead for I/O and some more for CPU

Daniel Mošmondor
A: 

Actually, like all things - it depends on the data, and the serializer.

Commonly (although perhaps unwisely) people mean BinaryFormatter for "binary", but this has a number of foibles:

  • in adds lots of type metadata (which all takes space)
  • by default it includes field names (which can be verbose, especially for automatically implemented properties)

Conversely, xml generally has overheads such as:

  • tags adding space and IO
  • the need to parse tags (which is remarkably expensive)
  • lots of text encoding/decoding

Of course, xml is easily compressed, adding CPU but hugely reducing bandwidth.

But that doesn't mean one is faster; I would refer you to some sample stats from here (with full source included), to which I've annotated the serializer base (binary, xml, text, etc). Look in particular at the first two results; it looks like XmlSerializer trumped BinaryFormatter on every value, while retaining the cross-platform advantages. Of course, protobuf then trumps XmlSerializer ;p

These numbers tie in quite well to ServiceStack's benchmarks, here.

BinaryFormatter *** binary
Length: 1314
Serialize: 6746
Deserialize: 6268

XmlSerializer *** xml
Length: 1049
Serialize: 3282
Deserialize: 5132

DataContractSerializer *** xml
Length: 911
Serialize: 1411
Deserialize: 4380

NetDataContractSerializer *** binary
Length: 1139
Serialize: 2014
Deserialize: 5645

JavaScriptSerializer *** text (json)
Length: 528
Serialize: 12050
Deserialize: 30558

(protobuf-net v2) *** binary
Length: 112
Serialize: 217
Deserialize: 250
Marc Gravell
A: 

Well, first of all, XML is a bloated format. Every byte you send in binary form would be similar to at least 2 or 3 bytes in XML. For example, sending the number "44" in binary, you need just one byte. In XML you need an element tag, plus two bytes to put the numer: <N>44</N> which is a lot more data.
One difference is the encoding/decoding time required to handle the message. Since binary data is so compact, it won't eat up much clock cycles. If the binary data is a fixed structure, you could probably load it directly into memory and access every element from it without the need to parse/unparse the data.
XML is a text-based format which needs a few more steps to be processed. First, the format is bloated so it eats up more memory. Furthermore, all data is text and you might need them in binary form, thus the XML needs to be parsed. This parsing still needs time to process, no matter how fast your code is. ASN.1 is a "binary XML" format that provides a good alternative for XML, but which will need to be parsed just like XML. Plus, if most of the data you use is text, not numeric, then binary formats won't make a big difference.
Another speed factor is the total size of your data. When you just load and save a binary file of 1 KB or an XML file of 3 KB then you probably won't notice any speed difference. This is because disks use blocks of a specific size to store data. Up to 4 KB fits easily within most disk blocks. Thus, for the disk it doesn't matter if it needs to read 1 KB or 3 KB since it reads the whole 4KB block. But when the binary file is 1 megabyte and the XML is 3 megabytes, the disk will need to read a lot more blocks to just read the XML. (Or to write it.) And then it even matters if your XML is 3 MB or just 2.99 MB or 3.01 MB.
With transport over TCP/IP, most binary data will be UU-encoded. With UU-encoding, your binary data will grow with 1 byte for every 3 bytes in the data. XML data will not be encoded thus the size difference becomes smaller, thus the speed difference becomes less. Still, the binary data will still be faster since the encoding/decoding routines can be real fast.
Basically, size matters. :-)

But with XML you have an additional alternative. You can send and store the XML in a ZIP file format. Microsoft Office does this with it's newer versions. A Word document is created as an XML file, yet stored as part of a bigger ZIP file. This combines the best of both worlds, since Word documents are mostly text thus a binary format would not add much speed increase. Zipping the XML makes storage and sending the data a lot faster simply by making it binary. Even more interesting, a compressed XML file could end up being smaller than a non-compressed binary file, thus the zipped XML becomes the faster one. (But it's cheating since the XML is now binary...)

Workshop Alex