views:

63

answers:

2

When specifying XML formats that contain a list of items, there is often a choice of at least two distinct styles. One uses a container element for the list, the other doesn't. As an example: if specifying a document with multiple pages, one could do this:

<document>
  <title>...</title>
  <pages>
    <page>...</page>
    <page>...</page>
  </pages>
</document>

Or just this:

<document>
  <title>...</title>
  <page>...</page>
  <page>...</page>
</document>

What are the pros and cons of each approach?

Some I can think of are:

  • the former allows expressing an explicit empty list (useful if the list itself is a conceptual entity)
  • the former is probably slightly better for error recovery (although that shouldn't matter if XSD validation is used)
  • the latter is more concise
  • the latter doesn't require a distinction between adding the first element or any successive (no management of the container element)

EDIT

To clarify: I am assuming there is no meaning attached to the pages element. There are no other elements inside, no attributes attached and it is hard to find any other name than 'pages', 'pageList' or similar for it.

EDIT 2

I found another entry about the same question. While the answer are all for the container/parent element, it seems to come down to treating the container as actual object or the assumption that it is easier to model a schema by having that extra container element (which I tend to disagree with).

+4  A: 

In the example you have given the difference is subtle, however the two examples actually represent completely different things:

  • The second example is of a document which has a title and many pages
  • The first example however is of a document with a title and a collection of pages

Like I said, in your example the difference is superfluous and so it is easy to miss, so instead consider the following slight variation:

<document>
  <title>...</title>
  <contents>
    <page>...</page>
    <page>...</page>
  </contents>
  <chapter name="chapterName">
    <page>...</page>
    <page>...</page>
  </chapter>
  <index>
    <page>...</page>
    <page>...</page>
  </index>
</document>

In this case the document has many collections of pages. (Of course some might argue that you could equally represent this differently):

<document>
  <title>...</title>
  <page section="contents">...</page>
  <page section="chapter1">...</page>
  <page section="index">...</page>
</document>

However you would have had to change the page element, in the above example I'd argue for the worse (why should page have to know about what it is contained in?)

Another subtle consideration is that often the ordering of elements means something:

<document>
  <page>...</page>
  <title>...</title>
  <page>...</page>
  <page>...</page>
</document>

In this example we have (for whatever reason) a page before the title - obviously not possible when using collections. Another consideration is that each collection might also (for example) have a title.

My point is that they are representing different things - it's not really a case of pro's vs con's so much of a case of choosing the format that most closely matches your data model.

Kragen
I understand your points, but I am talking about the scenario where there is no conceptual entity attached to the list. There are no attributes on the container element, there is only one element type inside. MS likes doing that quite a bit (see e.g. DatadiagramML, the Visio XML format).
Peter Becker
And one more comment: the ordering I can define in XSD (and other schema languages) without the container element. The enforced order between title and page is actually much easier to achieve, in particular if you want only one title element.
Peter Becker
@Peter - If your example is exactly as described in the question *and will never change* then the choice is completely arbitrary. (My personal preference would probably be for the first option)
Kragen
It sounds to me like Microsoft's schema choices were made so that the xml format closely represents the data structures used internally in the application (e.g. the `document` class has a `pages` collection, rather than being a collection of pages)
Kragen
@Kragen: I'm pretty sure you are right with your assumption about mapping internal data structures. I've used the VBA documentation a lot to understand the XML format.
Peter Becker
+3  A: 

It is really a matter of personal choice although the first form is more powerful, IMO. t makes multiple pages, e.g. pages with different name possible. It just depends on your requirement.

<document>
  <title>...</title>
  <pages name="page1">
    <page>...</page>
    <page>...</page>
  </pages>
  <pages name="page2">
    <page>...</page>
    <page>...</page>
  </pages>
</document>
fastcodejava