views:

97

answers:

8

I’m curious if there is standard or guideline for determining what types of things should be attributes vs elements within the xml file.

I’m also curious about creating xmlarray and xmlarrayitem lists using XMLSerializer. For example if I have the following:

<SomeBaseTag>
   <Item1 Attr11=”one” Attr12=”two” />
   <Item1 Attr11=”one” Attr12=”two” />
   <Item1 Attr11=”one” Attr12=”two” />
   <Item2 Attr21=”one” Attr22=”two” />
   <Item2 Attr21=”one” Attr22=”two” />
   <Item2 Attr21=”one” Attr22=”two” />
</SomeBaseTag>

Should I change it to:

<SomeBaseTag>
  <Item1s>
     <Item1 Attr11=”one” Attr12=”two” />
     <Item1 Attr11=”one” Attr12=”two” />
     <Item1 Attr11=”one” Attr12=”two” />
  </Item1s>
  <Item2s>
     <Item2 Attr21=”one” Attr22=”two” />
     <Item2 Attr21=”one” Attr22=”two” />
     <Item2 Attr21=”one” Attr22=”two” />
  </Item2s>
</SomeBaseTag>
A: 

It's a matter of taste. Generally, this sort of data is best represented in a less verbose format like YAML or JSON anyway.

EDIT: e.g.,

SomeBaseTag:
    Item1s:
        - {Attr11: one, Attr12: two}
        - {Attr11: one, Attr12: two}
    Item2s:
        - {Attr21: one, Attr22: two}
        - {Attr21: one, Attr22: two}
Thom Smith
+1  A: 

It is a style thing - attributes make xmls to be look cleaner and less verbose than the element heavy counterpart. Also it depends on the tools you use for parsing such xmls - some I have used in the past are easier to code when it is an element vs attribute. But this aspect is not a big deal. It is better to keep xml smaller since xml in itself is already verbose

OpenSource
+3  A: 

It all depends on the semantics of what you are trying to represent with your XML document.

For example, if your SomeBaseTag represents a market stall and Item1 represents apples and Item2 represents oranges, then the first format is perfectly appropriate.

If, however, the two items are distinct and would be better grouped separately then the second format makes more sense. This would be the case if SomeBaseTag represented elementary particles and Item1s were fermions and Item2s were bosons.

The fact that in your example the two different items share the same attribute names, makes it more logical that they are more closely related.

paracycle
A: 

I agree with everyone on the "matter of taste" thing, but I'd add another thing to consideration. After all, XML is a markup language, so you may want to think what would be left of it if you stripped off all tags and their attributes.

Michael Krelin - hacker
Although the "ML" in XML stands for "Markup Language", and although it originally was meant that way, XML is rarely used today to mark up text.
John Saunders
Right, John, but as I said I'd *add* this to consideration for the case of balanced pros and contras.
Michael Krelin - hacker
huh? if you stripped off all tags from an XML document, you'd be left with rubbish. XML is not a text markup thing, regardless what the name might imply to you. XML is about the InfoSet. Designing an XML schema based *even in part* on some thought of what the text would look like without the angle brackets and tags ... seems....uh.... very wrong.
Cheeso
It depends on the nature of the data, Cheeso. XML application extends beyone mere `InfoSet`.
Michael Krelin - hacker
+2  A: 

As has been said, style and taste are the primary factors. There are others.

Attributes are restricted in terms of what they can contain. For instance, they cannot contain elements. Also, certain characters like "<" cannot appear in an attribute. An element may contain text, other elements, or both.

I'll also mention one specific "style" issue. Your XML should be consistent. One thing that I dislike about WSDL is that most of the contents are contained in wrapper elements, except for messages:

<wsdl>
    <types/>

    <message/>
    <message/>
    <message/>

    <portTypes/>
    <bindings/>
    <service/>
</wsdl>

I've always been annoyed that there is no <messages/> element.

John Saunders
A: 

It sounds like you are wanting to create some definition by which your XML can be measured against. If that is the case I would suggest you learn XML Schema. It is a fantastic tool for defining XML structures that can even be used to create entirely languages. In that case it is similar to DOCTYPE, except that XML generated from Schema is self-aware of its own structural definitions. That is important if data contained by the XML is defined from the hierarchy of elements containing it.

As far as attributes go the general rule is to use elements to contain data instead of attributes. Elements defined with Schema can specify data type constraints as well as structure qualities described previously. The benefit to using attributes is brevity. An attribute can easily take the place of what would otherwise require nested elements 2 to 4 deep to describe just as accurately.

+1  A: 
Dour High Arch
The basis for your disagreement seems to be invalid. The question does not pre-suppose that the semantics of attributes and elements are the same. The question is, what is the preferred way to represent "something" in XML. and the correct answer seems to be, "it's a matter of taste, readability, tool compatibility, and so on."
Cheeso
Indeed the question does not presuppose this, the upvoted answers do. I claim they are wrong in asserting that it "is a matter of taste" because the rules for elements and attributes are different. Whether your "taste" is to do things is irrelevant if the syntax forbids it. Readability is improved as well. I made no claims about "tool compatibility, and so on."
Dour High Arch
+1  A: 

I feel that there is some taste to the design of an XML schema. But there are distinct differences in the two alternatives you offered.

example 1:

<SomeBaseTag>   
    <Item1/>
    <Item1/>
    <Item2/>

example 2:

<SomeBaseTag>   
    <Set1>
      <Item1/>
      <Item1/>
    </Set1>
    <Set2>
        <Item2/>
    </Set2>

The first reads to me like a big container with a mix of Item1 and Item2 entities in it, in (I presume) a random or potentially mixed order. The second is a container with two subcontainers, each of which contain a set of one particular type of entity.

That difference maybe unimportant for your purposes. But in some cases it IS important, especially as the schema becomes more complicated. See the example from John Saunders on WSDL for an illustration.

WSDL is this:

<wsdl>
    <types/>

    <message/>
    <message/>
    <message/>

    <portTypes/>
    <bindings/>
    <service/>
</wsdl>

Suppose the first-level containers were omitted "as a matter of taste". You'd then have

<wsdl>
    <schema/>        
    <schema/>        
    <schema/>        
    <message/>
    <message/>
    <message/>

    <operation/>
    <operation/>
    <operation/>
    <binding/>
    <binding/>
    <binding/>
    <service/>
</wsdl>

At that point, lacking a portType, it's not easy to relate the service to a set of operations.

Cheeso