ansaurus

Question

Answer 1

+2 A:

Both methods for storing object's properties are perfectly valid. You should depart from pragmatic considerations. Try answering following question:

Which representation leads to faster data parsing\generation?
Which representation leads to faster data transfer?
Does readability matter?

...

aku 2008-08-29 01:23:23

Answer 2

+3 A:

the million dollar question!

first off, don't worry too much about performance now. you will be amazed at how quickly an optimized xml parser will rip through your xml. more importantly, what is your design for the future: as the XML evolves, how will you maintain loose coupling and interoperability?

more concretely, you can make the content model of an element more complex but it's harder to extend an attribute.

Adam 2008-08-29 01:24:43

Answer 3

+3 A:

It is arguable either way, but your colleagues are right in the sense that the XML should be used for "markup" or meta-data around the actual data. For your part, you are right in that it's sometimes hard to decide where the line between meta-data and data is when modeling your domain in XML. In practice, what I do is pretend that anything in the markup is hidden, and only the data outside the markup is readable. Does the document make some sense in that way?

XML is notoriously bulky. For transport and storage, compression is highly recommended if you can afford the processing power. XML compresses well, sometimes phenomenally well, because of its repetitiveness. I've had large files compress to less than 5% of their original size.

Another point to bolster your position is that while the other team is arguing about style (in that most XML tools will handle an all-attribute document just as easily as an all-#PCDATA document) you are arguing practicalities. While style can't be totally ignored, technical merits should carry more weight.

erickson 2008-08-29 01:26:57

Answer 4

+4 A:

When in doubt, KISS -- why mix attributes and elements when you don't have a clear reason to use attributes. If you later decide to define an XSD, that will end up being cleaner as well. Then if you even later decide to generate a class structure from your XSD, that will be simpler as well.

Luke 2008-08-29 01:27:43

Answer 5

+34 A:

I use this rule of thumb:

An Attribute is something that is self-contained, i.e., a color, an ID, a name.
An Element is something that does or could have attributes of its own or contain other elements.

So yours is close. I would have done something like:

EDIT: Updated the original example based on feedback below.

  <ITEM serialNumber="something">
      <barcode encoding="Code39">something</barcode>
      <Location>XYX</LOCATION>
      <TYPE modelNumber="something">
         <VENDOR>YYZ</VENDOR>
      </TYPE>
   </ITEM>

Chuck 2008-08-29 01:28:13

This is a good rule of thumb ;) I'm using it myself and I think many people do.

ivan_ivanovich_ivanoff 2009-03-24 22:46:03

John Ballinger 2009-07-05 12:06:40

Good point, John!

Chuck 2009-07-06 19:24:23

Really late to the party, but the special ASCII char argument is wrong -- that's what escaping is for, both for attributes and text data.

micahtan 2009-11-25 04:11:50

@micahtan: If you must consider escaping, it will be more expensive to serialize/deserialize. If you know it is never going to happen, you can just skip that extra overhead, and it will be much faster to execute.

awe 2010-05-07 05:47:02

@awe: I was unaware that it was more expensive to deserialize w/escaping. Do you have a source reference for that? I deal primarily with .NET, and I haven't seen anything that mentions it.As far as "never going to happen", I've been burned quite a few times by that. If your XML contains numbers or codes, it may be a safe assumption. Proper names or user input text has a nasty habit of introducing those characters, particularly the ampersand and both single and double quotes.

micahtan 2010-05-07 16:54:30

@micahtan: Escaping isn't enough. The rules for attributes are different. John Ballinger's note is correct. In particular, the character '<' can't be in an attribute regardless of escaping. See http://www.w3.org/TR/xml/#CleanAttrVals

Don Roby 2010-05-17 13:49:10

@donroby - Sorry, that would be my mistake in communicating. By escaping, I mean XML encoding. '<' = < etc. It seems odd to me to decide between an attribute or element based on the characters that make up the content instead of the meaning of the content.

micahtan 2010-05-17 21:29:09

@micahtan - No, I think actually the XML encoded version is not allowed. But I'm basing this just on reading the spec - perhaps I'll write a JUnit/XMLUnit test to check instead of continuing to trust my understanding of the spec, which is indeed quite difficult to decipher.But in deciding between attribute or value you really do have to consider what they in fact allow, and attributes seem to allow less than elements.

Don Roby 2010-05-18 01:10:19

I have written a JUnit/XMLUnit test to check my understanding of the spec as noted above, and it seems that Java's SAX implementation quite happily accepts encoded '<' in attribute values. I still suspect it's not a good idea, but I can't back it up with code...

Don Roby 2010-05-18 12:50:39

@donroby: it's incorrect. The replacement text of `<` is `<`, which is a character reference, not an entity reference. `<` is OK in attributes. See: http://www.w3.org/TR/REC-xml/#sec-predefined-ent

Porges 2010-06-24 04:10:47

@John: if this is a problem then there's something in your toolchain which isn't producing valid XML. I don't think this is a reason to choose between attributes or elements. (Furthermore, you can't "just add CDATA tags" around user-input because it might contain `]]>`!)

Porges 2010-06-24 04:13:12

Answer 6

+1 A:

It's largely a matter of preference. I use Elements for grouping and attributes for data where possible as I see this as more compact than the alternative.

For example I prefer.....

<?xml version="1.0" encoding="utf-8"?>
<data>
    <people>
         <person name="Rory" surname="Becker" age="30" />
        <person name="Travis" surname="Illig" age="32" />
        <person name="Scott" surname="Hanselman" age="34" />
    </people>
</data>

...Instead of....

<?xml version="1.0" encoding="utf-8"?>
<data>
    <people>
        <person>
            <name>Rory</name>
            <surname>Becker</surname>
            <age>30</age>
        </person>
        <person>
            <name>Travis</name>
            <surname>Illig</surname>
            <age>32</age>
        </person>
        <person>
            <name>Scott</name>
            <surname>Hanselman</surname>
            <age>34</age>
        </person>
    </people>
</data>

However if I have data which does not represent easily inside of say 20-30 characters or contains many quotes or other characters that need escaping then I'd say it's time to break out the elements... possibly with CData blocks.

<?xml version="1.0" encoding="utf-8"?>
<data>
    <people>
        <person name="Rory" surname="Becker" age="30" >
            <comment>A programmer whose interested in all sorts of misc stuff. His Blog can be found at http://rorybecker.blogspot.com and he's on twitter as @RoryBecker</comment>
        </person>
        <person name="Travis" surname="Illig" age="32" >
            <comment>A cool guy for who has helped me out with all sorts of SVn information</comment>
        </person>
        <person name="Scott" surname="Hanselman" age="34" >
            <comment>Scott works for MS and has a great podcast available at http://www.hanselminutes.com </comment>
        </person>
    </people>
</data>

Rory Becker 2008-09-30 09:23:18

This is flat wrong I'm afraid - you should follow W3C guidelines: http://www.w3schools.com/DTD/dtd_el_vs_attr.asp - XML should not be formed on readability or on making it "compact" - but rather using elements or attributes correctly for the purpose which they were designed for.

Vidar 2009-01-07 12:59:29

I'm sorry, but this is misleading. The W3schools page is not W3C guidleines. The W3C XML recommendation (in which I was a participant) allows elements and attributes to be used according to the needs and styles of the users.

peter.murray.rust 2009-07-05 11:45:14

Answer 7

+6 A:

It may depend on your usage. XML that is used to represent stuctured data generated from a database may work well with ultimately field values being placed as attributes.

However XML used as a message transport would often be better using more elements.

For example lets say we had this XML as proposed in the answer:-

<INVENTORY>
   <ITEM serialNumber="something" barcode="something">
      <Location>XYX</LOCATION>
      <TYPE modelNumber="something">
         <VENDOR>YYZ</VENDOR>
      </TYPE>
    </ITEM>
</INVENTORY>

Now we want to send the ITEM element to a device to print he barcode however there is a choice of encoding types. How do we represent the encoding type required? Suddenly we realise, somewhat belatedly, that the barcode wasn't a single automic value but rather it may be qualified with the encoding required when printed.

   <ITEM serialNumber="something">
      <barcode encoding="Code39">something</barcode>
      <Location>XYX</LOCATION>
      <TYPE modelNumber="something">
         <VENDOR>YYZ</VENDOR>
      </TYPE>
   </ITEM>

The point is unless you building some kind of XSD or DTD along with a namespace to fix the structure in stone, you may be best served leaving your options open.

IMO XML is at its most useful when it can be flexed without breaking existing code using it.

AnthonyWJones 2008-09-30 10:24:12

Good point on the "barcode", I rushed my example and would have definitely broken that out into its own element. Also good point on the XSD/DTD.

Chuck 2009-02-23 21:26:46

Answer 8

+12 A:

Some of the problems with attributes are:

* attributes cannot contain multiple values (child elements can)
* attributes are not easily expandable (for future changes)
* attributes cannot describe structures (child elements can)
* attributes are more difficult to manipulate by program code
* attribute values are not easy to test against a DTD

If you use attributes as containers for data, you end up with documents that are difficult to read and maintain. Try to use elements to describe data. Use attributes only to provide information that is not relevant to the data.

Don't end up like this (this is not how XML should be used):

<note day="12" month="11" year="2002" to="Tove" from="Jani" heading="Reminder"  body="Don't forget me this weekend!"> </note>

Source: http://www.w3schools.com/DTD/dtd_el_vs_attr.asp

2009-01-07 12:49:45

First point is incorrect, see: http://www.w3.org/TR/xmlschema-2/#derivation-by-list

Porges 2010-06-27 23:31:54

I'd say that first point is correct and `list` is a partial workaround to this problem. There can't be multiple attributes with same name. With `list` attribute still has only one value, which is a whitespace separated list of some datatypes. Separation characters are fixed so you cannot have multiple values if a single value of the wanted datatype can contain whitespace. This rules out the chances for having for example multiple addresses in one "address" attribute.

jasso 2010-09-05 01:49:02

Answer 9

+2 A:

Use elements for data and attributes for meta data (data about the element's data).

If an element is showing up as a predicate in your select strings, you have a good sign that it should be an attribute. Likewise if an attribute never is used as a predicate, then maybe it is not useful meta data.

Remember that XML is supposed to be machine readable not human readable and for large documents XML compresses very well.

Michael J 2009-01-15 19:22:35

Answer 10

+1 A:

I agree with feenster. Stay away from attributes if you can. Elements are evolution friendly and more interoperable between web service toolkits. You'd never find these toolkits serializing your request/response messages using attributes. This also makes sense since our messages are data (not metadata) for a web service toolkit.

bagheera 2009-04-07 06:22:09

Answer 11

+2 A:

There is no universal answer to this question (I was heavily involved in the creation of the W3C spec). XML can be used for many purposes - text-like documents, data and declarative code are three of the most common. I also use it a lot as a data model. There are aspects of these applications where attributes are more common and others where child elements are more natural. There are also features of various tools that make it easier or harder to use them.

XHTML is one area where attributes have a natural use (e.g. in class='foo'). Attributes have no order and this may make it easier for some people to develop tools. OTOH attributes are harder to type without a schema. I also find namespaced attributes (foo:bar="zork") are often harder to manage in various toolsets. But have a look at some of the W3C languages to see the mixture that is common. SVG, XSLT, XSD, MathML are some examples of well-known languages and all have a rich supply of attributes and elements. Some languages even allow more-than-one-way to do it, e.g.

<foo title="bar"/>;

or

<foo>
  <title>bar</title>;
</foo>;

Note that these are NOT equivalent syntactically and require explicit support in processing tools)

My advice would be to have a look at common practice in the area closest to your application and also consider what toolsets you may wish to apply.

Finally make sure that you differentiate namespaces from attributes. Some XML systems (e.g. Linq) represent namespaces as attributes in the API. IMO this is ugly and potentially confusing.

peter.murray.rust 2009-07-05 11:58:27

Answer 12

A:

Just a couple of corrections to some bad info:

@John Ballinger: Attributies can contain any character data. < > & " ' need to be escaped to < > & " and ' , respectively. If you use an XML library, it will take care of that for you.

Hell, an attribute can contain binary data such as an image, if you really want, just by base64-encoding it and making it a data: URL.

@feenster: Attributes can contain space-separated multiple items in the case of IDS or NAMES, which would include numbers. Nitpicky, but this can end up saving space.

brianary 2009-07-23 21:38:28

Not just ids or names. They can contain space-separated lists of just about anything.

John Saunders 2009-07-25 11:02:26

Answer 13

+2 A:

Others have covered how to differentiate between attributes from elements but from a more general perspective putting everything in attributes because it makes the resulting XML smaller is wrong.

XML is not designed to be compact but to be portable and human readable. If you want to decrease the size of the data in transit then use something else (such as google's protocol buffers).

Patrick 2009-11-10 16:43:39

Answer 14

A:

"XML" stands for "eXtensible Markup Language". A markup language implies that the data is text, marked up with metadata about structure or formatting.

XHTML is an example of XML used the way it was intended:

<p><span lang="es">El Jefe</span> insists that you
    <em class="urgent">MUST</em> complete your project by Friday.</p>

Here, the distinction between elements and attributes is clear. Text elements are displayed in the browser, and attributes are instructions about how to display them (although there are a few tags that don't work that way).

Confusion arises when XML is used not as a markup language, but as a data serialization language, in which the distinction between "data" and "metadata" is more vague. So the choice between elements and attributes is more-or-less arbitrary except for things that can't be represented with attributes (see feenster's answer).

dan04 2010-06-24 04:02:05

Answer 15

A:

I found this really good resource:

http://www.ibm.com/developerworks/xml/library/x-eleatt.html

Laurens Holst 2010-09-27 10:59:15

ansaurus

tags:

views:

answers:

XML attribute vs XML element

related questions