views:

326

answers:

6

does anyone have a good definition for what a binary protocol is? and what is a text protocol actually? how do these compare to each other in terms of bits sent on the wire?

here's what wikipedia says about binary protocols:

A binary protocol is a protocol which is intended or expected to be read by a machine rather than a human being (http://en.wikipedia.org/wiki/Binary_protocol)

oh come on!

to be more clear, if I have jpg file how would that be sent through a binary protocol and how through a text one? in terms of bits/bytes sent on the wire of course.

at the end of the day if you look at a string it is itself an array of bytes so the distinction between the 2 protocols should rest on what actual data is being sent on the wire. in other words, on how the initial data (jpg file) is encoded before being sent.

any coments are apprecited, I am trying to get to the essence of things here.

salutations!

A: 

I think you got it wrong. It's not the protocol that determines how data looks on the "wire", but it's the data type that determine which protocol to use to transmit it. Take tcp socket for instance, a jpeg file will be sent and received with a binary protocol 'cause it's binary data (not human readable, bytes that go among the 32-126 ascii range), but you can send / recv a text file with both protocols and you wouldn't notice the difference.

Simone Margaritelli
no I don;t think I got it wrong. I am still looking for a (good) definition of WHAT a binary protocol IS. the example with the jpeg was to clarify my question and nothing else, don't make it the center of the question.I should say that the protocol determines how the data looks when transmitted on the wire otherwse why is that a protocol??
der_grosse
I gave you a precise definition, you have just to read carefully."A binary protocol manages bytes that go among the 32-126 ascii range, also called non printable characters"
Simone Margaritelli
the text protocols handle those also by splitting them into smaller ones that will fit the ASCII table. and so on. so in the best case is your definition vague.but thanks for the contribution.
der_grosse
A: 

Both uses different char set, the text one, use a reduced char set, the binary includes all it can, not only "letters" and "numbers", (that's why wikipedia says "human being")

o be more clear, if I have jpg file how would that be sent through a binary protocol and how >through a text one? in terms of bits/bytes sent on the wire of course.

you should read this Base64

any coments are apprecited, I am trying to get to the essence of things here.

I think the essence for narrowing the charset, is narrowing the complexity, and reach portability, compatibility. It's harder to arrange and agree with many to respect a Wide charset, (or a wide whatever). The Latin/Roman alphabet and the Arabic numerals are worldwide known. (There are of course other considerations to reduce the code, but that's a main one)

Let say in binary protocols the "contract" between the parts is about bits, first bit mean this, second that, etc.. or even bytes (but with the freedom of use the charset without thinking in portability) for example in privated closed system or (near hardware standars), however if you design a open system you have to take account how your codes will be represented in a wide set of situations, for example how it will be represented in a machine at other side of world?, so here comes the text protocols where the contract will be as standar as posible. I have designed both and that were the reasons, binary for very custom solutions and text for open or/and portable systems.

Hernán Eche
I know about base64 and what it does and this is exactly wat I had in mind when I posted the question. base64 is good when I want to send anything in its ASCII representation (encoding) so that would be a text protocol. technicaly it splits the bit-input into pairs of 6, uses a lookup table and so on. can anyone provide some similar explanation for how a binary procol works? supplemental question: at what OSI level can we talk about binary and text protocols and what are the exact meaning of these worlds at those levels?
der_grosse
Example of binary are low level protocols like simple serial communication (http://en.wikipedia.org/wiki/Asynchronous_serial_communication) or how data is stored in memory (http://en.wikipedia.org/wiki/Data_structure_alignment). About OSI..well because text and binary protocols are used to represent data (not only for comunication) they don't need to be at any OSI level, said that, I can tell layer 1,2,3,4 have "binary protocol", and "text protocol" can be on 5,6,7.
Hernán Eche
+7  A: 

Binary protocol versus text protocol isn't really about how binary blobs are encoded. The difference is really whether the protocol is oriented around data structures or around text strings. Let me give an example: HTTP. HTTP is a text protocol, even though when it sends a jpeg image, it just sends the raw bytes, not a text encoding of them.

But what makes HTTP a text protocol is that the exchange to get the jpg looks like this:

Request:

GET /files/image.jpg HTTP/1.0
Connection: Keep-Alive
User-Agent: Mozilla/4.01 [en] (Win95; I)
Host: hal.etc.com.au
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */*
Accept-Language: en
Accept-Charset: iso-8859-1,*,utf-8

Response:

HTTP/1.1 200 OK
Date: Mon, 19 Jan 1998 03:52:51 GMT
Server: Apache/1.2.4
Last-Modified: Wed, 08 Oct 1997 04:15:24 GMT
ETag: "61a85-17c3-343b08dc"
Content-Length: 60830
Accept-Ranges: bytes
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: image/jpeg

<binary data goes here>

Note that this could very easily have been packed much more tightly into a structure that would look (in C) something like

Request:

struct request {
  int requestType;
  int protocolVersion;
  char path[1024];
  char user_agent[1024];
  char host[1024];
  long int accept_bitmask;
  long int language_bitmask;
  long int charset_bitmask;
};

Response:

struct response {
  int responseType;
  int protocolVersion;
  time_t date;
  char host[1024];
  time_t modification_date;
  char etag[1024];
  size_t content_length;
  int keepalive_timeout;
  int keepalive_max;
  int connection_type;
  char content_type[1024];
  char data[];
};

Where the field names would not have to be transmitted at all, and where, for example, the responseType in the response structure is an int with the value 200 instead of three characters '2' '0' '0'. That's what a text based protocol is: one that is designed to be communicated as a flat stream of (usually human-readable) lines of text, rather than as structured data of many different types.

Tyler McHenry
+1 for the 1-liner definition "The difference is really whether the protocol is oriented around data structures or around text strings."
Frank Shearar
Tyler, thanks for the answer, a rather deep one I should say. geek scenario that resides on what we all agree upon, on the wire travel only 0's and 1's. tell me please whether this captures what you ment. say I want to send number 15 (dec) over the network (you have 2 identical computers over the network, no big/little indian chaos etc). if I am going to use a binary protocol (say I send it through a TCP socket) this will go on the wire as 00001111 but if I am going to use a text protocol it'll go as 00110001 (ASCII for char 1) AND 00110101 (ASCII for char 5)true or crap? :)
der_grosse
That's correct. The advantage of doing it the text way is not only human readability but also not having to worry about endianness if your numbers are more than one byte long.
Tyler McHenry
I don't agree with the 1-line definition neither with the example of sending char 15, to see the differences, as I put in my answer, you have to know the whole charset and the delimiters/protocol, You can't say based on a single data example if the protocol is text based or binary based. You could be "looking" at the cable and see a 65 (char 'A') and you still can't say it's a text based or a binary protocol. Both could have same representation for a single char or not, but that's not fundamental.
Hernán Eche
A: 

Examples of binary protocols: RTP, TCP, IP.

Examples of text protocols: SMTP, HTTP, SIP.

This should allow you to generalise to a reasonable definition of binary vs text protocols.

Hint: just skip to the example sections, or the diagrams. They serve to illustrate Tyler's rocking answer.

Frank Shearar
Frank, thanks for the links but when I'll be done with the RFC's it will be 2099 :) I wanted some answers from people who've already read those. I'm still pondering on Tyler McHenry's answer though...
der_grosse
+2  A: 

Here's a kind-of cop-out definition:

You'll know it when you see it.

This is one of those cases where it is very hard to find a concise definition that covers all corner cases. But it is also one of those cases where the corner cases are completely irrelevant, because they simply do not occur in real life.

Pretty much all protocols that you will encounter in real life will either look like this:

> fg,m4wr76389b zhjsfg gsidf7t5e89wriuotu nbsdfgizs89567sfghlkf
>  b9er t8ß03q+459tw4t3490ß´5´3w459t srt üßodfasdfäasefsadfaüdfzjhzuk78987342
< mvclkdsfu93q45324äö53q4lötüpq34tasä#etr0 awe+s byf eart

[Imagine a ton of other non-printable crap there. One of the challenges in conveying the difference between text and binary is that you have to do the conveying in text :-)]

Or like this:

< HELLO server.example.com
> HELLO client.example.com
< GO
> GETFILE /foo.jpg
< Length: 3726
< Type: image/jpeg
< READY?
> GO
< ... server sends 3726 bytes of binary data ...
> ACK
> BYE

[I just made this up on the spot.]

There's simply not that much ambiguity there.

Another definition that I have sometimes heard is

a text protocol is one that you can debug using telnet

Maybe I am showing my nerdiness here, but I have actually written and read e-mails via SMTP and POP3, read usenet articles via NNTP and viewed web pages via HTTP using telnet, for no other reason than to see whether it would actually work.

Actually, while writing this, I kinda caught the fever again:

bash-4.0$ telnet smtp.googlemail.com 25
Trying 74.125.77.16...
Connected to googlemail-smtp.l.google.com.
Escape character is '^]'.
< 220 googlemail-smtp.l.google.com ESMTP Thu, 15 Apr 2010 19:19:39 +0200
> HELO
< 501 Syntactically invalid HELO argument(s)
> HELO client.example.com
< 250 googlemail-smtp.l.google.com Hello client.example.com [666.666.666.666]
> RCPT TO:Me <[email protected]>
< 503 sender not yet given
> SENDER:Me <[email protected]>
< 500 unrecognized command
> RCPT FROM:Me <[email protected]>
< 500 unrecognized command
> FROM:Me <[email protected]>
< 500-unrecognized command
> HELP
< 214-Commands supported:
< 214 AUTH HELO EHLO MAIL RCPT DATA NOOP QUIT RSET HELP ETRN
> MAIL FROM:Me <[email protected]>
< 250 OK
> RCPT TO:You <[email protected]>
< 250 Accepted
> DATA
< 354 Enter message, ending with "." on a line by itself
> From: Me <[email protected]>
> To: You <[email protected]>
> Subject: Testmail
>
> This is a test.
> .
< 250 OK id=1O2Sjq-0000c4-Qv
> QUIT
< 221 googlemail-smtp.l.google.com closing connection
Connection closed by foreign host.

Damn, it's been quite a while since I've done this. Quite a few errors in there :-)

Jörg W Mittag
A: 

As most of you suggested we can't differentiate whether the protocol is Binary or text simply by looking at the content on the wire

AFIK

Binary protocol - Bits are boundary Order is very critical

Eg., RTP

First two bits are version Next bit is MarkUp bit

Text protocol - Delimiters specific to protocol Order of the fields is not important

Eg., SIP

One more is, in binary protocol, we can split a byte, i.e., a single bit might have a specific individual meaning; While in a text protocol minimum meaningful unit is BYTE. You can't split a byte.

thnx

-Bytes

Bytes