views:

411

answers:

7

hi,

I am wondering what the differences are between binary and text based protocols. I read that binary protocols are more compacts/faster to process. How does that work out? Since you have to send the same amount of data? No?

E.g how would the string "hello" differ in size in binary format?

thanks

A: 

The string "hello" itself wouldn't differ in size. The size/performance difference is in the additional information that Serialization introduces (Serialization is how the program represents the data to be transferred so that it can be re-construted once it gets to the other end of the pipe).

For example, when serializing the following in .NET using XML (one of the text serialization methods):

string helloWorld = "Hello World!";

You might get something like (I know this isn't exact):

<helloWorld type="String">Hello World!</helloWorld>

Whereas Binary Serialization would be able to represent that data natively in binary without all the extra markup.

Justin Niessner
+3  A: 

If all you are doing is transmitting text, then yes, the difference between the two isn't very significant. But consider trying to transmit things like:

  • Numbers - do you use a string representation of a number, or the binary? Especially for large numbers, the binary will be more compact.
  • Data Structures - How do you denote the beginning and ending of a field in a text protocol? Sometimes a binary protocol with fixed length fields is more compact.
Nick
A: 

If you use ASN.1 and BER to send "hello" in a protocol message like this:

ProtocolMessage ::= String
;

then 1 byte takes to encode its identifier octer, 1 byte takes to encode length and UTF-8 encoding of "hello" is another 5 bytes. So the result message is 7 bytes.

skwllsp
A: 

binary protocols are better if you are using control bits/bytes

i.e instead of sending msg:Hello in binary it can be 0x01 followed by your message (assuming 0x01 is a control byte which stands for msg)

So, since in text protocol you send msg:hello\0 ...it involves 10 bytes where as in binary protocol it would be 0x01Hello\0 ...this involves 7 bytes

And another example, suppose you want to send a number say 255, in text its 3 bytes where as in binary its 1 byte i.e 0xFF

SysAdmin
It would more commonly be 4 raw bytes (0x0000_00FF) to support larger integers, and you usually must count the delimiter in text protocols, giving at least 4 bytes there too ("255" + 1).
Roger Pate
@Roger Pate: The point is, that binary protocols have have *potentially* a higher entropy compared to textual protocols. If I knew that the number is between 1 and 255 why would I use an integer to encode it? I can also turn your example around: If there is indeed the need for large numbers (e.g. integers from 1 to 4,294,967,295) then any number larger than 999 is being more efficienty encoded with 32 fixed bits instead of 4 bytes.
Caffeine
@Caffeine: As shown, I'm using "byte" as "8 bits", so 32 bits is identical to 4 bytes.
Roger Pate
@Roger Pate: A typo on my part, what I meant was ASCII-encoding with *more* than 4 bytes (delimiter included)
Caffeine
A: 

I wouldn't say that binary formats are more faster to process. If you have a look at CSV or fixed-field-length textual format - it is still can be processed fast.

I would say, everything depends on who is the consumer. If the human being is at the end (like for HTTP or RSS), then there is no need to somehow compact the data, except maybe compressing it.

Binary protocols need parsers/convertors, difficult to extend and keep the backward compatibility. The higher you go in protocol stack, the more human-oriented protocols are (TCP is binary, as packets have to be processed by routers at high speed, but XML is more human-friendly).

I think, size variations does not matter today a lot. For your example, hello will take the same amount in binary format as in text format, because text format is also "binary" for the computer - only the way we interprete the data matters.

dma_k
-1 Binary formats can be much faster to process because they can match the machine representation much better. HTTP is used for computer-to-computer communication as well as for computer-to-machine. Binary protocols (can) have less need for parsers/convertors than text protocols. The higher you go in protocol stack, the more **abstract** the protocols are, not human-oriented. And binary can be thought of as human-oriented provided you have a good reader (what about GIFF or JPG). Size variation can matter tremendously - think of mobile devices and mobile web.
MarkJ
this is wrong on so many levels it isn't even funny
fuzzy lollipop
+1  A: 

You need to be clear as to what is part of the protocol and what is part of the data. Text protocols can send binary data and binary protocols can send text data.

The protocol is the part of the message the states "Hi can I connect? I've got some data, where should I put it?, You've got a reply for me? great! thanks, bye!"

Each bit of the conversion is (probably) much smaller in a binary protocol, Take HTTP for example (which is text based):

if you had an encoding standard I bet you could come up with sequence of characters smaller that the 4 Bytes needed for the word 'PUSH'

JamesB
On the other hand, 3 bytes smaller is not "much smaller." Yes, it can add up, but sometimes people get all excited by the potential 75% savings and look no further. (And for the record, I have been guilty of this quite a few times.)
Max E.
+4  A: 

Text protocols are better in terms of readability, ease of reimplementing, and ease of debugging. Binary protocols are more compact.

However, you can compress your text using a library like LZO or Zlib, and this is almost as compact as binary (with very little performance hit for compression/decompression.)

You can read more info on the subject here:
http://www.faqs.org/docs/artu/ch05s01.html

Max E.