views:

38

answers:

2

When designing an XML structure I sometimes find myself wanting to use similar patterns across multiple element types that appear in the same instance document.

For example, the Head-Body pattern can often be useful if you want to keep data and meta data (data about the data) separate, and sometimes it makes sense for multiple types of element in a document to use that pattern (albeit with different structures).

When you're using the same pattern in multiple element types, does it make sense to:

A) Keep the element names the same? So you might have multiple different types of element called "Head", each having a different structure... (Your "DataSnapshot" element might have a "Head" element within it, and your "CompiledDataSet" element might have a "Head" element within it too, both "Head" elements having different structures.)

B) Or should your element names match the types that are defined in your schema? So, the head element in your "DataSnapshot" element might be named "DataSnapshotHead" instead of just "Head", and your "CompiledDataSet" element might have a head element called "CompiledDataSetHead".

A has the advantage of making the pattern obvious and keeping names short. But I'm guessing it could be confusing for people when elements have the same name but different structures. I think it might make some sorts of XPath queries more complicated too (not sure about that - I've not used XPath much).

B would typically require longer element names, and would make it less clear that the same pattern was being used. But at least the element names would make it clear what type they had.

Below is an example of approach A:

<DataSnapshot>
      <Head>
        <!-- meta-data that's specific to the 
             DataSnapshot -->
      </Head>
      <DataSets>
        <StandardDataSet setId="abc">
          <Head>
            <!-- meta-data that's specific to the
                 StandardDataSet -->
          </Head>
          <Values>
            <!-- List of values, specific to the 
                 StandardDataSet in their structure -->
          </Values>
        </StandardDataSet>
        <CompiledDataSet setId="xyz">
          <Head>
            <!-- meta-data that's specific to the
                 CompiledDataSet.  Different structure 
                 to the Head of the StandardDataSet. -->
          </Head>
          <Values>
            <!-- List of values, specific to the 
                 CompiledDataSet in structure.
                 i.e. different to those of the
                 StandardDataSet -->
          </Values>
        </CompiledDataSet>
        <!-- Any number of DataSets of different types could
             go here -->
      </DataSets>
    </DataSnapshot>

Note that the example above has multiple elements called "Head" and "Values" that are of different types.

If using approach B, then the example above would have element names like: DataSnapshotHead, StandardDataSetHead, StandardDataSetValues, CompiledDataSetHead, CompiledDataSetValues. The XML would be more bulky, but would it be clearer or easier to parse?

I'm designing a public XML API which is why I'm obsessing over details like this. I want this API to be as intuitive and easy to parse as possible. I've hunted for best practices relating to this, but my Google skills don't seem to be up to it - I've just been finding lots of advice about reusing types in an XML Schema, but nothing about reusing element names. So I'm hoping some of you folks that have worked with a lot more XML than me might have some sage advice about which approach is better?

+1  A: 

IMHO, would tend toward (B) and recommend against reusing element names, unless you explicitly move them into separate namespaces, since users of your schema might be using 'lazy' Xpath expressions like //Head etc.

nonnb
Thanks for the opinion :) I was pretty much decided on A until I read a little about XPath and expressions like //Head - that made me think that A could make XPath much more complicated. My XML-parsing skills are stuck in the stone age (DOM and SAX) but I need to think of the people using more-modern/better approaches and I understand XPath is really common.
MB
+1  A: 

Actually, your approaches A and B are not mutually exclusive.

Here namespaces come to help. The primary purpose of namespaces is to disambiguate otherwise same names that are used in different context or, as the linguists say, in different topic worlds.

Therefore, I would use:

snap:head

and

ds:head

where snap: and ds: are namespace prefixes that are bound, respectively, to the namespaces I have defined: "headBody:Data SnapShot" and "headBody:Data Set" .

This approach has the following advantages:

  1. Puts every name in its own topic world.

  2. Avoids name conflicts.

  3. Keeps names short, readable and meaningful.

  4. Encourages hierarchical thinking and modeling.

Dimitre Novatchev
That's a neat solution, thanks Dimitre. Though my main concern with that is that I want my XML API to be usable by people that don't really know what namespaces are or how to deal with them (I think this is the case for a lot of developers who only deal with XML occasionally or that are generally lacking in experience). I fear that putting namespace prefixes in my instance XML could scare off potential users of the API.
MB
@MB: Namespaces are not at all a foreign concept for developers. The most popular programming languages (such as Java and C#) have had them for many years. Therefore, your concerns aren't justified. Be confident that developers are really smart people. :)
Dimitre Novatchev
Hehe yes it's possible I'm being overly concerned... Though truth be told I've only recently been getting my head around namespaces and XML schema myself, despite having been developing software for a long time. A lot of public XML APIs like Twitter's don't publish schemas or contain any namespace prefixes, so it's possible to do a fair bit of XML-related work without dealing with them. Admittedly I've spent most of my time working on somewhat unusual software projects so I'm a bit out of touch with what most developers do and don't know. Anyway, I definitely appreciate your thoughts :)
MB