Irrespective of personal taste, here is the fundamental set of issues:
Use attributes to map values to unique names when ordering is not significant. Otherwise, use elements.
- Values: numbers, strings, dates, etc., but not multi-property objects.
- Unique names: Each attribute name on an element must be unique. If a thing represented by an element can have more than one Foo associated with it, Foo should not be an attribute.
- Ordering is not significant: The application cannot depend on values being presented to processes in a particular order.
An example: if you want to round-trip data between (say) ADO.NET and XML, should you store column values in attributes or elements? (Never mind for a moment that ADO.NET does this for you.) Well, column names map to values uniquely, and the column values are readily serializable data types. So sure, why not do this?
<Person FirstName="John" MiddleName="Q." LastName="Smith"/>
But actually that's an information-destroying transformation. The order that columns appear in an ADO.NET record is significant. If something's in column 2 before your transformation, it should be in column 2 afterwards. Converting them to attributes will lose this information. (I know one DOM implementation, for instance, that retrieves attributes in alphabetical order by name.)
This is why ADO.NET represents rows like this, verbose though it is:
<Person>
<FirstName>John</FirstName>
<MiddleName>Q.</MiddleName>
<LastName>Smith</LastName>
</Person>
As for the common wisdom that elements are for information, and attributes are for metainformation: this is often very good advice. It's also often just superstition that will lead you into bad places.
For one, metainformation may need to contain multiple values associated with the same name. You might, say, want to tag an element with a list of pages that will use it:
<Person Pages="B1,B2,B3,B4">
<FirstName>John...
Ever tried to write an XSLT template that parses a comma-separated list? You'll learn a lot by doing it, but it's probably not something you want to know.
For another, XML designers who don't know what they're up against let this advice lead them to put in an attribute what should really be in the element's tag name. For instance:
<Person Type="Employee">
<SSN>123-45-6789</SSN>
<Extension>123</Extension>
</Person>
<Person Type="Customer">
<PhoneNumber>123-456-7890</PhoneNumber>
<BillingAddress>...
and so on. Guess what happens when you try to write a schema that enforces different rules on Person
elements based on the Type
attribute? Failure. Schemas are bound to the element name. All Person
elements must have the same schema. In this case, the elements should be named Employee
and Customer
.