views:

277

answers:

5

Hi,

I'd like some advice on which format to use for transmitting data over TCP. Currently, I have devised a simple text-protocol with delimited strings. I'm thinking I should use something out there that already exists such as XML, JSON, or XMPP?

What data formats do people use for transmitting over TCP?

I would like to optimize for speed and throughput but would rather adopt an existing standard than use my own.

A: 

In most cases folks just declare a record with a layout that is reproducable on both ends and use that. It is only when you have more complicated needs that you have to do anything more fancy than that.

For varying length strings, I'd probably just implement them as a series of length and then length bytes of data. In Cish languages you could probably get by without the length by leveraging the null terminator. Nothing more complex than that is really needed.

T.E.D.
A: 

Depends on the type of data and who is consuming your data.

If you're writing your own client/server pair, then arguably the best format is some sort of binary serialization. It's compact, transmits easily over the wire, and can be quickly reconstructed.

If you're writing something for many consumers using a variety of languages...then I'd worry more about XML or JSON (depending on the size and complexity of your data).

XML is more well suited for large, complex pieces of data.

JSON is better for smaller, more compact pieces of data.

Justin Niessner
Can you elaborate on the XML -> large data, JSON small data statement?
JosefAssad
When you start trying to transmit large amounts of data via JSON, it quickly becomes problematic. Trying to look at a huge piece of JSON and debugging issues is a nightmare. XML is more structured and easier to deal with when things begin to get larger and more complex.
Justin Niessner
+2  A: 

You might want to look at Google Protocol Buffers or Apache Thrift. Search for apache thrift to get the link as stackoverflow is not allowing me to post more than one link.

ajitomatix
http://incubator.apache.org/thrift/
Nosrama
A: 

Your data's key/value appearance suggests JSON might just be easier to work with.

Optimising speed and throughput is probably better handled outside your application, layers 3 and 4 in the OSI model I'd hazard. One unit of optimising effort invested in those layers will likely give you more optimality than one unit of effort sunk into the structure and encoding of your data.

JosefAssad
Thanks - can you elaborate on your point about the OSI model? What should I look at in particular?
Nosrama
A: 

XML sounds like a good choice for your data type - there's plenty of XML libraries out there already (or your language might even have XML parsing built in).

Being text-based also makes it easier to debug things by hand, so that's one reason to stay away from binary encodings on the wire.

caf