views:

183

answers:

8

I'm going to be writing a program which has some web services in it that use XML to pass data back and forth. The format of the XML is predefined - I can't change it to suit my needs - but in code I can handle the data any way I want.

My question: Is it better for me to handle the data structure in code as an XML tree, or to write an equivalent data structure as an object in the language along with some utility functions for conversion to and from the XML?

I have some thoughts on the issue myself, but I don't want to unintentionally bias anyone's answers. This is a language-agnostic question, but if there's any language considerations you have I'd like to hear it.

Edit: To clarify, the XML format itself is setup in a logical manner. The object in the language wouldn't differ much from it. For example, it might look something like this (this is a poor example, but you get the gist I hope):

<car>
    <make>DATA</make>
    <model>DATA</model>
    <ownershipDates>
        <startDate>DATA</startDate>
        <endDate>DATA</endDate>
    </ownershipDates>
</car>
+1  A: 

I would create an object to handle the XML. If you make a mistake when handling the datastructue, it can possibly be picked up by the compiler. With XML you have no way of doing this. This way also, if the xml ever changes (not that things do that in programming :) ), you have a layer of abstraction between the XML structure and what your code interfaces with. Easier to change it in the object than all over the place in the rest of the code.

Kevin
+1 for design-time error catching. Nothing worse then a misspelled field (or worse, a misspelled field in the incoming data that you've spelled correctly).
overslacked
A: 

It depends on the data. For example, if the data makes sense to be represented in parent/child relationships, but for the sake of reporting/transmission it's been flattened, there's a good chance I'd rebuild the data structure (if processing speed wasn't a major concern and I didn't have a deadline).

I generally prefer to write classes for things like this; I believe the code is much easier to understand. It really does depend on what you're doing the data though.

overslacked
+1  A: 

Without context, it's hard to say. But I'm guessing you'll have an easier time working with a native data structure. If the XML schema ever changes or you're required to work with a different format, you would have decoupled those those changes from the rest of your program.

So probably go with the latter option: "Write an equivalent data structure as an object in the language along with some utility functions for conversion to and from the XML."

Patrick McElhaney
A: 

That also is dependent on the language you are working in. In many - but not all - languages there is language support to easily convert between XML and data structures, so you won't need to code a lot of transformation code.

For example when I work in C# I try to design my XML and datastructures along in a way that the inbuilt XML serializer can handle.

froh42
+1  A: 

In general, XML was meant for data interoperability and human-readability. Not speed of processing or ease of processing. For processing your data you may find it best to handle it as an internal data structure. In Python, for instance, popular parsers are the ones that translate an XML document into internal Python structures, for ease of processing and data-enrichment, which can then in turn be translated back to XML.

If you're taking in XML and your program is outputting XML, you may want to check into XSLT. Once you understand the language, XML translation is much, much simpler for common tasks.

Jweede
I took it for granted that he would use an XML parser. I was thinking the utility functions would perform a higher level conversion, e.g. walking the parse tree and building objects instantiated from the "Car" class.
Patrick McElhaney
True. Just like your answer suggested, the question is very general. In many languages there are parsers that will automatically convert XML to easy internal data structures. That's all I wanted to point out. But, if he is doing XML-in/XML-out operations with no enrichment, XSLT usually works best.
Jweede
A: 

Even though it's tempting to use XML (e.g. a DOM object) as your data model, remember that "real" objects (i.e. actually modeling the data with classes) allows you to have methods and complex Properties (references to other objects), etc. - whereas XML is just POD basically.

If you're using a language that has decent means of doing XML binding you can enjoy both worlds. You can have a real object model with rich, object oriented classes, that also serialize from and to XML.

Assaf Lavie
A: 

Write a data object that serializes itself

GogaRieger
A: 

The latter: equivalent classes plus code to read/write XML.

This is mainly because you can automatically generate the classes and code from the XML Schema, with JAXB (assuming Java - so this answer is language-partisan). Instead of dealing with XML, you deal with JAXB's objects.

OTOH, your XML is nice and regular, so the DOM objects are quite similar to the objects you'd write yourself. A downside is that the DOM API doesn't use idiomatic data structures of any particular language, so it's less intuitive to write and less readable to maintain. Fortunately, there are other DOM-like tools for most languages that have more natural API's for that language.

13ren