views:

684

answers:

10

Which would be the correct format for this XML data, are they equivalent or are there trade offs between the two?

1.

<sitemap>
  <category name="Animals">
    <section title="Dogs">
      <page url="/pics/greatdane.jpg" title="Great Dane"/>
    </section>
  </category>
</sitemap>

2.

<sitemap>
  <page>
    <category>Animals</category>
    <section>Dogs</section>
    <title>Great Dane</title>
    <url>/pics/greatdane.jpg</url>    
  </page>
</sitemap>

I've implemented the first example with my style sheet and it seems to work fine, but I'm unsure what the correct form should be.

+2  A: 

There aren't usually right or wrong answers for this sort of thing. Mostly it depends on how you need to access your data.

One nice thing about the first one is that it easily, perhaps later, supports multiple pages in a section and multiple sections in a category. In the second one that information is spread across pages.

David Norman
+1  A: 

XML is horrible file format and ends up in religious wars. Do whatever you feel is best at the time, provided you can justify it. However, your particular examples are significantly different:

In 1, sitemap encapsulates a category encapsulating a section which encapsulates the page.

In 2, sitemap encapsulates a page which encapsulates three items: category, section, title and URL. None of these three items contains any other, they are siblings, all commonly held within the page.

As two different structures, it depends on which you intended.

A different question would be the preference for attributes versus first-level tags. But as I say, that is a different question!

Mat
+1  A: 

I prefer the second. Elements should be used to describe data (which is mostly what you are doing). Attributes are used for values not relevant to the data, such as the max size for an element for example.

Otávio Décio
+15  A: 

The issue of attributes vs elements has been around for the better part of a decade and there is no right answer. Instead consider the differences and from that you should be able to decide which to use:

  • There can be only one instance of an attribute although you can enforce this with elements using DTD or XML Schema;
  • Attributes are unordered. Elements are not;
  • Attributes lead to a more concise syntax if there are no children. Compare:

    <page name="Sitemap"/>

to:

<page>
  <name>Sitemap</name>
</page>

I know which one I prefer;

  • Not really relevant now since DTDs aren't used much (if at all) over XML Schema but I'll add it anyway: DTDs allow default values (implied) for attributes but no such mechanism for elements; and
  • Elements, being elementss, can have children and attributes of their own. Attributes obviously cannot.

So, from your example, your innermost <page> element has a URL attribute (although it's an image for some reason--perhaps a preview icon? If so the attribute name is misleading). A webpage only has one URL (generally) so that'd be a good example of something that could be an attribute.

If on the other hand you wanted to list the images on the page, there could obviously be more than one so you'd need elements for that.

But, in the end, most of the time there's no right or wrong answer and it's largely a question of style.

cletus
You say "node," but you mean "element." Processing instructions, comments, and text are also nodes. This is an important distinction for understanding a lot of intricacies of XML, like the XPath node() function.
Robert Rossney
Quite right. Corrected.
cletus
One other thing to add is that if using XSL to transform the XML there is no way to decode entities within attributes (http://stackoverflow.com/questions/67859/xslt-cannot-get-xslt-to-output-an-even-after-escaping-the-character#86934).
null
+11  A: 

The two examples are not equivalent, because they form different hierarchies. Is a sitemap a list of categories, like the first example? Or is it a list of pages like the second example?

The answer to that is orthogonal to the element vs attribute question.

On the Element vs Attribute question: Here is your second example transformed to an attribute approach:

<sitemap>
 <page    
  category='Animals'
  section='Dogs'
  title='Great Dane'
  url='/pics/greatdane.jpg'
  /> 
</sitemap>

The above and your second case are equivalent. One consideration for choosing one versus the other is based on whether you may modify the schema in the future. Adding an attribute to the url element as in the following example would likely be a backward compatable change. The semantically same modification would be impossible in the attribute approach, as you cannot attach an attribute to an attribute.

<sitemap>
 <page>    
  <category>Animals</category>
  <section>Dogs</section>    
  <title>Great Dane</title>    
  <url nofollow="true">/pics/greatdane.jpg</url>
 </page> 
</sitemap>
Steve Steiner
+1  A: 

IBM has posted an article titled Principles of XML Design that provides some guidelines on when to use attributes vs. elements. I found this article useful, your mileage may vary.

Ryan Taylor
+2  A: 

I use elements for data and attributes for Metadata

Marioh
+1  A: 

The first alternative scales a bit better. Suppose that you need to add another attribute of an animal section, such as the status of the section. I suggest that this representation:

<sitemap>
  <category name="Animals">
    <section title="Dogs" status="draft">
      ...
    </section>
  </category>
</sitemap>

does a better job of conveying the following facts:

  1. name is a property of the category
  2. a category could have multiple sections
  3. status is a property of a section; not all sections in the category are required to have the same status.

In short, it makes the hierarchical structure clearer, and shows which attributes apply at each level of the hierarchy.

joel.neely
+3  A: 

I think that the answer is quite obvious when you think about how you want to add more dogs:

<sitemap>
  <category name="Animals">
    <section title="Dogs">
      <page url="/pics/greatdane.jpg" title="Great Dane"/>
      <page url="/pics/wienerdog.jpg" title="Wiener Dog"/>
    </section>
  </category>
</sitemap>

or

<sitemap>
  <page>
    <category>Animals</category>
    <section>Dogs</section>
    <title>Great Dane</title>
    <url>/pics/greatdane.jpg</url>    
  </page>
  <page>
    <category>Animals</category>
    <section>Dogs</section>
    <title>Wiener Dog</title>
    <url>/pics/wienerdog.jpg</url>
  </page>
</sitemap>
Svante
+1  A: 

A simple rule of thumb: if you can implement a data structure as an unordered map of name/value pairs, you can use an element's attributes to represent it. If you can't (if, for instance, you'll have multiple names, or a given name will have multiple associated value, or the ordering of the name/value pairs is significant), then an element with attributes is the wrong representation.

Two other things that can make this the wrong representation:

  • The values contain markup. This can be represented in attribute values, but it's awkward, because all of the markup characters have to be escaped into entities. Also, the markup won't be parsed.
  • You're using XML Schema validation, and there is more than one allowable set of name/value pairs. XML Schema can only define one set of allowable attributes for an element, whereas it can define multiple mutually-exclusive sets of allowable child elements.

The obvious benefit to using attributes is that they result in terser XML. They're (very) marginally faster to parse than elements, too.

Robert Rossney