views:

95

answers:

3

when is serialization,marshaling etc required during communication between programs residing across 2 different machines /network/Internet?

Suppose I have a client program in java/flash and a server program in C. Can't I implement communication using a custom protocol of my own ? I guess so. When is serialization etc needed?I am aware Java RMI,CORBA etc have these mechanisms. But why? Is it a must? please enlighten me?

A: 

The best way to this these days is to send XML messages back and forth.

Romain Hippeau
+1  A: 

Can't I implement communication using a custom protocol of my own ? I guess so.

You can. You probably shouldn't reinvent the wheel. Serialization is tricky. Use a well tested, standard solution for better results. You'll spend far less time learning an API than writing data passing routines.

When is serialization etc needed?

For starters, it's needed to take some in memory structure from one process to another.

There are more use cases described here: http://en.wikipedia.org/wiki/Serialization

I am aware Java RMI,CORBA etc have these mechanisms. But why? Is it a must? please enlighten me?

None of these "are a must", like you said, you could write your own protocol. You're far better of (IMO) leaning on some existing technologies in this area, like XML or one of the others you mention. Which technology you use is really dependent on what you're trying to do, so I'm not going to speculate :)

One great mechanism for passing serialized data is Google's protocol buffers. They take care of the encoding (in a much more efficient way than XML) and endian translation.

Stephen
your answer is okay somewhat but doesn't provide enough reasoning or clarity. Please explain in simple yet clear and concrete manner if possible.Thanks!
@trojanwarrior3000: What part of my answer is unclear? FWIW, I did provide concrete reasoning - it's slower to roll your own protocol, serialization is needed to pass data between processes.
Stephen
I got it this way - I suppose serialization is useful when making remote calls with arbitrary data types like objects - then those facility makes it easy than to work on making one's own protocol.Thanks!
@trojanwarrior3000: added a pointer to use protocol buffers. If you provide details on what problem you're trying to solve, you might get better answers.
Stephen
+1  A: 

Objects in your program have a well defined memory layout, imposed by your compiler. But that layout is not going to be exactly the same in another program, running on another machine, compiled by a different compiler. And it is not typically very compatible with the transport medium, like a network connection or a file. Which you need to take care of to get the object from one machine to another.

Files and network packets are simple streams of bytes. That's where serialization comes into play, you'll need to serialize the in-memory object into a stream of bytes. And it needs to be de-serialized at the receiving end, back from a stream of bytes into an object.

The obvious way to do so is binary serialization. You take the bytes for each field in the object and write them to the stream. Very efficient, but also very troublesome. The first problem you run into is that the receiving end has a different idea of what the object looks like. It might be compiled with a different version of the object declaration, one that had an added field for example. The problem is starker when the object is exchanged between different machines. They may have a very different idea about the number of bytes in an integer. Or the order of the bytes (endian-ness).

There have been many solutions to this problem. They typically involve some kind of metadata that describes the fields in the object. The advent of Unicode made it possible to put both the metadata and the field values into a textual description, XML is the best example of this.

Hans Passant
If one use a byte stream protocol and deal with endian-ness on both sides, will that be enough?
Hope somebody who is good with this problem will throw more light on my comment!
You cannot deal with endianness without also knowing the structure of the data. Each field has to be reversed individually.
Hans Passant
The type of packet(commands) as seen by server, thee structure of data, its total length,its each field has how many bytes etc will be specified in the protocol spec. That what is the problem?
or what are the difficulties that may still arise?