views:

109

answers:

6

Hi,

Reading StackOverflow and listening the podcasts by Joel Spolsky and Jeff Atwood, I start to believe that many developers hate using XML or at least try to avoid using XML as much as possible for storing or exchanging data.

On the other hand, I enjoy using XML a lot for several reasons:

  • XML serialization is implemented in most modern languages and is extremely easy to use,
  • Being slower than binary serialization, XML serialization is very useful when it comes to using the same data from several programming languages or where it is intended to be read and understand, even for debugging, by an human (JSON, for example, is more difficult to understand),
  • XML supports unicode, and when used properly, there are no problems with different encoding, characters, etc.
  • There are plenty of tools which makes it easy to work with XML data. XSLT is an example, making it easy to present and to transform data. XPath is another one, making it easy to search for data,
  • XML can be stored in some SQL servers, which enables the scenarios when data which is too complicated to be easily stored in SQL tables must be saved and manipulated; JSON or binary data, for example, cannot be manipulated through SQL directly (except by manipulating strings, which is crazy in most situations),
  • XML does not require any applications to be installed. If I want my app to use a database, I must install a database server first. If I want my app to use XML, I don't have to install anything,
  • XML is much more explicit and extensible than, for example, Windows Registry or INI files,
  • In most cases, there are no CR-LF problems, thanks to the level of abstraction provided by XML.

So, taking in account all the benefits of using XML, why so many developers hate using it? IMHO, the only problem with it is that:

  • XML is too verbose and requires much more place than most other forms of data, especially when it comes to Base64 encoding.

Of course, there are many scenarios where XML doesn't fit at all. Storing questions and answers of SO in an XML file on server side will be absolutely wrong. Or, when storing an AVI video or a bunch of JPG images, XML is the worst thing to use.

But what about other scenarios? What are the weaknesses of XML?


To the people who considered that this question is not a real question:

Contrary to questions like a non-closed Significant new inventions in computing since 1980, my question is a very clear question and clearly invites to explain what weaknesses the other people experience when using XML and why they dislike it. It does not invite to discuss, for example, if XML is good or bad. Neither does it require extended discussions; thus, the current answers received so far are short and precise and provide enough info I wanted.

But it is a wiki, since there cannot be an unique good answer to this question.

According to SO, "not a real question" is a question where "It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, or rhetorical and cannot be reasonably answered in its current form."

  • What is being asked here: I think the question itself is very clear, and several paragraphs of text above makes it even clearer,
  • This question is ambiguous, vague, incomplete: again, there is nothing ambiguous, neither vague nor incomplete,
  • or rhetorical: it is not: the answer to my question is not something obvious,
  • and cannot be reasonably answered: several people already gave great answers to the question, showing that the question can be answered reasonably.

It also seems quite obvious how to rate the answers and determine the accepted answer. If the answer gives good reasons of what's wrong with XML, there are chances that this answer will be voted up, then accepted.

+1  A: 

I'm not the right person to ask, as I am a big fan of xml myself. However, I can tell you one of the main complaints that I have heard:

It is hard to work with. Here, hard means that it takes knowing an API and that you will need to write relatively much code to parse your xml. While I wouldn't say that it's really all that hard, I can only agree that a language that is made to describe objects, can be accessed more easily when using a language that supports dynamically created objects.

Jasper
+4  A: 

Some weaknesses:

  • It is somewhat difficult to associate xml files and external resources, which is why the new Office document formats use a zip envelope that includes a skeleton xml file and resource files bundled together. The other option of using base64 encoding is very verbose and doesn't allow good random access, which brings one to the next point:
  • Random access is difficult. Neither of the two traditional modes of reading an xml file - construct a DOM or forward-only SAX style reading allow for truly random access.
  • Concurrent write access to different parts of the file is difficult, which is why its use in Windows executable manifests is error prone.
  • What encoding does an xml file use? Strictly speaking you guess the encoding first, then read the file and verify the encoding was right.
  • It is difficult to version portions of a file. Therefore if you want to provide granular versioning, you need to split your data. This is not just a file format issue, but also due to the fact that tools generally provide per-file semantics - version control tools, sync tools like DropBox, etc.
bright
I think you saved the best (that is, the worst) for last. I have yet to see a diff tool for XML that was worth figuring out how to use it.
Robert Rossney
+1  A: 

I think in general the reaction is simply because XML is overused.

However, if there is one word I hate about XML, with a passion, is namespaces. The lost productivity around namespace problems is horrific.

Yishai
People who don't understand XML namespaces often have that reaction.
John Saunders
@John Saunders: I agree with Yishai: even when you understand XML namespaces, you can have a lot of pain with it, since some APIs are very unfriendly and documentation - incomplete. A PHP implementation of XPath is a good example. Or at least was in 2008 when I used it the last time.
MainMa
@MainMa: sounds like a good reason not to use PHP. I hope they've fixed it to implement standards from the previous decade.
John Saunders
@John Saunders, it isn't for lack of understanding them. Consider this: http://stackoverflow.com/questions/3314292/i-just-upgraded-to-drools-5-and-the-xml-rules-will-not-load. Sure everyone gets namespaces wrong - the tools (PHP) the implementers (the drools team) etc, etc. Bottom line, horrific levels of lost productivity over an issue that isn't important for the 80% use case.
Yishai
@Yishai: I would argue that changing the namespace without providing migration tools constitutes lack of understanding of namespaces.
John Saunders
@John, but my point is that as someone choosing a format, you have to be concerned that tools you work with will fail on namespace issues because apparently, they are not well understood (no matter how well you understand them yourself).
Yishai
@Yishai: I'm sure I'll be considered contrary when I say that this is not a new standard. The first version was in [1999](http://www.w3.org/TR/1999/REC-xml-names-19990114/). Any tools which do not support XML namespaces should be discarded, immediately, and publicly ridiculed. We cannot have multiple flavors of standards like XML, or XML will not be a standard. Any software that does not support this standard should be treated like a cockroach and stomped, or like a drug addict - don't enable it, just cut it off.
John Saunders
I could not agree more. A tool that doesn't properly supports namespaces is as much a menace as a tool that assumes attributes are ordered.
Robert Rossney
+3  A: 
<xml>
    <noise>
        The
    </noise>
    <adjective>
        main
    </adjective>
    <noun>
        weakness
    </noun>
    <noise>
        of
    </noise>
    <subject>
        XML
    </subject>
    <noise>
        ,
    </noise>
    <whocares>
        in my opinion
    </whocares>
    <noise>
        ,
    </noise>
    <wildgeneralisation>
        is its verbosity
    </wildgeneralisation>
    <noise>
        .
    </noise>
</xml>
paxdiablo
I'm increasingly convinced that the "M" in "XML" is for "madlib."
tadamson
+1  A: 

XML descends from SGML, the great-granddaddy of markup languages. The purpose of SGML and by extension XML is to annotate text. XML does this well and has a wide range of tools that increase its facility for a variety of applications.

The problem, as I see it, is that XML is frequently used, not to annotate text, but to represent structured data, which is a subtle but important difference. In practical terms, structured data needs to be concise for a variety of reasons. Performance is an obvious one, especially when bandwidth is limited. This is probably one of the main reasons why JSON is so popular for web applications. Concise data structure representation on the wire means better scalability.

Unfortunately, JSON is not very readable without extra whitespace padding, which is almost always omitted. On the other hand, if you have ever tried editing a large XML file using a command-line editor, it can be very awkward as well.

Personally, I find that YAML strikes a nice balance between the two extremes. Compare the following (copied from yaml.org with minor changes).

YAML:

invoice: 34843
  date: 2001-01-23
  billto: &id001
    given: Chris
    family: Dumars
    address:
      lines: |
        458 Walkman Dr.
        Suite #292
      city: Royal Oak
      state: MI
      postal: 48046
  shipto: *id001
  product:
  - sku: BL394D
    quantity: 4
    description: Basketball
    price: 450.00
  - sku: BL4438H
    quantity: 1
    description: Super Hoop
    price: 2392.00
  tax : 251.42
  total: 4443.52
  comments: >
    Late afternoon is best.
    Backup contact is Nancy
    Billsmer @ 338-4338.

XML:

<invoice>
   <number>34843</number>
   <date>2001-01-03</date>
   <billto id="id001">
      <given>Chris</given>
      <family>Dumars</family>
      <address>
        <lines>
          458 Walkman Dr.
          Suite #292
        </lines>
        <city>Royal Oak</city>
        <state>MI</state>
        <postal>48046</postal>
      </address>
   </billto>
   <shipto xref="id001" />
   <products>
      <product>
        <sku>BL394D</sku>
        <quantity>4</quantity>
        <description>Basketball</description>
        <price>450.00</price>
      </product>
      <product>
        <sku>BL4438</sku>
        <quantity>1</quantity>
        <description>Super Hoop</description>
        <price>2392.00</price>
      </product>
   </products>
   <tax>251.42</tax>
   <total>4443.52</total>
   <comments>
    Late afternoon is best. Backup contact is Nancy Billsmer @ 338-4338
   </comments>
</invoice>

They both represent the same data, but the YAML is over 30% smaller and arguably more readable. Which would you prefer to have to modify with a text editor? There are many libraries available to parse and emit YAML (i.e. snakeyaml for Java developers).

As with everything, the right tool for the right job is the best rule to follow.

C P1R8
A: 

My favorite nasty problem is with XML serialization formats that use attributes - like XAML.

This works:

<ListBox ItemsSource="{Binding Items}" SelectedItem="{Binding CurrentSelection}"/>

This doesn't:

<ListBox SelectedItem="{Binding CurrentSelection}" ItemsSource="{Binding Items}"/>

XAML deserialization assigns property values as they're read from the XML stream. So in the second example, when the SelectedItem property is assigned, the control's ItemsSource hasn't been set yet, and the SelectedItem property is being assigned to an item that yet know exists.

If you're using Visual Studio to create your XAML files, everything will be cool, because Visual Studio maintains the ordering of attributes. But modify your XAML in some XML tool that believes the XML recommendation when it says that the ordering of attributes is not significant, and boy are you in a world of hurt.

Robert Rossney
@Robert: that's a bug in XAML, and you should complain to Microsoft. They _must_ understand that if they base a format on XML, that they must adhere to XML standards. Period. Order of attributes is not allowed to be significant.
John Saunders
It's not really a bug. Bugs can be fixed. There's no real way to fix XAML (or any serialization format that uses attributes to represent properties) so that it can only be used to serialize properties whose order of assignment is insignificant. You could remove the ability to use attributes from XAML completely, but that would make XAML much, much less usable.
Robert Rossney
@John, when it comes to XML violations from Microsoft, I wouldn't even put this is the top tier. How about generating XMLs and the corresponding XSD but having the XML be invalid according to their own XSD? This is the situation in .NET remoting.
Yishai
@Robert: It's also not a best practice to have an object with properties that need to be initialized in a particular order. Perhaps that's the overriding bug here.
John Saunders
@Yishai: .NET Remoting no longer matters. If you find a situation like this with WCF, then please report it.
John Saunders
@John, if you have to integrate with a vendor that exposes it, it matters ...
Yishai
@Yishai: i meant matters to Microsoft, in terms of reporting bugs.
John Saunders
I agree in principle that it's not a best practice, John. But what reasonable alternative is there in the case of a class that has an `Items` collection and a `SelectedItem` property? Allowing `SelectedItem` to be set to something that's not in `Items` is very bad. I've stamped my foot and gotten cross, but that hasn't really advanced towards a solution.
Robert Rossney