views:

375

answers:

4

There's a related question What is the preferred Java XML binding framework?

In the answer currently ranked 2nd there, the poster drew a distinction between XML Serialization, and Java/XML data binding. As best I can tell, XML data binding means "creating an in-memory object graph from an XML document". And XML Serialization means "creating an XML document from an in-memory object graph."

I don't see that they are different at all, just different perspectives on the same problem.

The argument was that the emphasis in data binding is on the object model, while the emphasis in serialization is on the document format. I don't see that at all. If one is serializing an object graph to XML, presumably one cares about the format - it needs to be readable, toolable, validatable. (If the format is irrelevant, then why not just use binary serialization and be done with it?) On the other hand, when performing "data binding" (what I would call de-serializing), there must be a balanced emphasis on both the document format and the object model.

So the question to you:
Is there a difference between Java/XML data binding and XML Serialization that is worth worrying about?

+2  A: 

Libraries such as XStream offer XML Serialization (rather than XML document processing) so that object graphs can use standard Java serialization semantics to save/restore.

The serialized form may not be a document that you would wish to use outside a java environment because it may embed java class names (although you can change this with aliases).

Fortyrunner
+2  A: 

In many cases, the distinction is a small one. The answer to your question is: It depends on your use case. See, for example, Wikipedia's entry for XML Data Binding, which starts with:

XML data binding refers to the process of representing the information in an XML document as an object in computer memory.

By comparison, see the page for XStream, which is a serialization framework:

XStream is a simple library to serialize objects to XML and back again.

Thus, for XML data binding, the XML format is the "primary" format, the version of the data with an explicit definition. (Often an XSD.) The Java (or C# or ...) objects are a representation of the XML. However, for serialization, the objects are the definition, the "primary" format, and the XML representation is secondary.

Basically, when you have two formats of the same data -- one an object in your language of choice and the other an XML representation -- for most uses, one of those two formats will be primary and the other will be secondary. When the XML format is primary, you are talking about XML data binding. When the objects are primary, you are talking about serialization.

For serialization, why use XML instead of a binary format? Well, an XML format to your data is more likely to be human-readable, more likely to be able to handle changes to your objects (adding a new field, for example), less likely to run into encoding problems when moving the serialized objects between machines with different byte orders, and so on. XML is structured in a known and well-understood way, and XML is highly flexible with handling new fields and new elements. Binary formats are often home-grown and may or may not be brittle with respect to future changes to your serialization format.

Eddie
Thanks for the input. Even after reading this, I still feel that the distinction is an artificial one. Primary? Secondary? It feels subjective, fuzzy, arbitrary and artificial. This tells me there is no meaningful distinction. ps: My question about "why use XML" was rhetorical.
Cheeso
Well, for XML data binding, you're likely to create an XML schema and you're likely to care about the specific XML representation. For serialization, you will probably not care about the specific XML representation. For XML data binding, you go from XML to objects to XML. For serialization, you go from objects to XML to objects. It's not subjective or fuzzy at all. Is the XML a way to temporarily (or perhaps indefinitely) store or transmit the objects? Then it's serialization. Are the objects a convenient way to access the XML? Then it's data binding.
Eddie
A: 

Thanks for all the input. Considering it all, I still feel that the distinction is an artificial one. Primary? Secondary? It feels subjective, fuzzy, arbitrary and artificial. This tells me there is no meaningful distinction.

Cheeso
+2  A: 

No, there is a very important conceptual difference here, like you mention. It may be that implementations have strong similarities, and maybe that causes confusion, but conceptually it is not that unclear.

Data binding means binding Java objects to non-object content (relational data for ORMs, xml or json documents etc); and different representations (POJOs, relational/hierarchic data) are equally important. Data binding has to concern itself with specifics of data format(s) it supports; some features have no equivalent constructs in Java POJOs (for example: XML mixed content, comments, processing instructions, difference between attributes and elements). Data binding focuses on trying to bridge the impedance: allow as seamless conversion between representations as possible, and in both directions.

With object serialization, you start and end with (Java) objects -- other formats are of secondary concern, and only serve the purpose of passing objects. They may hard-code exact structure in data format used; but even if not, limit kinds of constructs that can be used. Object serialization has to deal with things that are specific to objects: identity, references, handling of cycles; things that data binders can ignore.

But here's what may confuse matters: in general, object serialization libs (like XStream) offer lots of flexibility to customize external format (even though it's of less importance than objects). And you can certainly use data binding tools as alternatives to pure object serialization tools -- for many/most cases they work pretty well. So you can indeed use tools for "secondary purposes". But there are limitations, depending on exact feature set you need: many data binding tools can not deal with cyclic references; object serializers can not support but subset xml and so forth. And even where you can use secondary tools, you might be better off choosing differently, with regards to ease of use, or performance.

StaxMan