tags:

views:

194

answers:

8

Hello,

When creating a new xml file, how does one go about structuring the file correctly or the best possible way. By structure, which may not be the best word in this case, I mean how does one choose between making something an element or an attribute of an element. For example, if I create a Person.xml file which contains a list of Persons, is it better to do something like:

<Person>
    <FirstName>John</FirstName>
    <LastName>Doe</LastName>
    <Age>23</Age>
</Person>

or is it better to do something like this or does it even matter?

<Person FirstName="John" LastName="Doe" Age="23"></Person>

Duplicate

When should I use Elements, and when should I use Attributes?

+1  A: 

It's pretty much a subjective thing.

Richard Ev
+4  A: 

XML files should (not to start a holy war) be structured as follows:

If it's data, or something that can be changed, then it should be like this:

<Person>
  <FirstName>John</FirstName>
  <LastName>Smith</LastName>
  <Age>23</Age>
</Person>

If it's an attribute of the thing Person then it should be like this:

<Person Type="Human">
  <FirstName>John</FirstName>
  <LastName>Smith</LastName>
  <Age>23</Age>
</Person>

There are multiple reasons for this practice, not the least of which includes the ease of fixing your XSLT Transforms whenever you change your method of retrieving Person data.

That's really the important part: Attributes define information about data (the Person Type), and the Data is something that is meant to fill in those holes. If you decide how you're going to change how you fill in those holes, then it becomes tougher if you've made them 'attributes' instead of 'data' when you want to Transform your XML later.

George Stocker
The distinction between "attribute" and "data" in this example is unclear (to say the least). Also, I can see no reason why the attributes make things "tougher" when working with XSLT: is using the @ prefix that difficult?
Robert Rossney
Robert: I deal in an application where some data is pulled from the database, and other data is pulled from an XML file. With Attributes the way they are, I have to transform that XML to XML that I can stuff data into, and then transform that XML into HTML. That's why.
George Stocker
+4  A: 

Really doesn't matter, but the way I decide is: if something could be considered an entity on its own (in this example, Person, I make it an element. If it's something that modifies the entity (or an attribute of the entity), I make it an attribute.

Example:

<Person FirstName="John" LastName="Doe" Age="23">
    <Clothing wet="No">
        <Shirt colour="Red" />
    </Clothing>
</Person>
DannySmurf
I've never put it into those words explicitly for myself, but I like that succinct decision tree for this question.
JMD
+1  A: 

It seems to me this is something akin to Chevy vs Ford, or Windows vs MacOS. There is no clear winner for all situations, and the mere question may generate a highly volatile "discussion" with the right participants. ;)

The short answer is that either may be appropriate depending on the situation. Sometimes the deciding factor is even which library you choose for reading or updating the data in the XML.

JMD
+1  A: 

Here is a pretty good article about the princeiples of XML design.

moose-in-the-jungle
+1  A: 

The first is the verbose way of doing things: Everything is an element. This is a common way that people do this simply because it's so easy to look at and parse.

However, attributes were introduced for just this reason: they're bits of information about the element. So, your second example is perfectly acceptable. In fact, you could even shorten it:

<Person FirstName="John" LastName="Doe" Age="23" />

I would probably do the latter.

The only time you wouldn't want this is if you need to have more xml data inside, or long formatted sections.

Robert P
+1  A: 

In general, you want elements to represent the "real" information that you're modeling, and reserve attributes for "meta" information - that qualifies the content.

Morendil
A: 

Irrespective of personal taste, here is the fundamental set of issues:

Use attributes to map values to unique names when ordering is not significant. Otherwise, use elements.

  • Values: numbers, strings, dates, etc., but not multi-property objects.
  • Unique names: Each attribute name on an element must be unique. If a thing represented by an element can have more than one Foo associated with it, Foo should not be an attribute.
  • Ordering is not significant: The application cannot depend on values being presented to processes in a particular order.

An example: if you want to round-trip data between (say) ADO.NET and XML, should you store column values in attributes or elements? (Never mind for a moment that ADO.NET does this for you.) Well, column names map to values uniquely, and the column values are readily serializable data types. So sure, why not do this?

<Person FirstName="John" MiddleName="Q." LastName="Smith"/>

But actually that's an information-destroying transformation. The order that columns appear in an ADO.NET record is significant. If something's in column 2 before your transformation, it should be in column 2 afterwards. Converting them to attributes will lose this information. (I know one DOM implementation, for instance, that retrieves attributes in alphabetical order by name.)

This is why ADO.NET represents rows like this, verbose though it is:

<Person>
   <FirstName>John</FirstName>
   <MiddleName>Q.</MiddleName>
   <LastName>Smith</LastName>
</Person>

As for the common wisdom that elements are for information, and attributes are for metainformation: this is often very good advice. It's also often just superstition that will lead you into bad places.

For one, metainformation may need to contain multiple values associated with the same name. You might, say, want to tag an element with a list of pages that will use it:

<Person Pages="B1,B2,B3,B4">
    <FirstName>John...

Ever tried to write an XSLT template that parses a comma-separated list? You'll learn a lot by doing it, but it's probably not something you want to know.

For another, XML designers who don't know what they're up against let this advice lead them to put in an attribute what should really be in the element's tag name. For instance:

<Person Type="Employee">
    <SSN>123-45-6789</SSN>
    <Extension>123</Extension>
</Person>
<Person Type="Customer">
    <PhoneNumber>123-456-7890</PhoneNumber>
    <BillingAddress>...

and so on. Guess what happens when you try to write a schema that enforces different rules on Person elements based on the Type attribute? Failure. Schemas are bound to the element name. All Person elements must have the same schema. In this case, the elements should be named Employee and Customer.

Robert Rossney