ansaurus

Question

Answer 1

+1 A:

I'm not the right person to ask, as I am a big fan of xml myself. However, I can tell you one of the main complaints that I have heard:

It is hard to work with. Here, hard means that it takes knowing an API and that you will need to write relatively much code to parse your xml. While I wouldn't say that it's really all that hard, I can only agree that a language that is made to describe objects, can be accessed more easily when using a language that supports dynamically created objects.

Jasper 2010-08-27 02:03:20

Answer 2

+4 A:

Some weaknesses:

It is somewhat difficult to associate xml files and external resources, which is why the new Office document formats use a zip envelope that includes a skeleton xml file and resource files bundled together. The other option of using base64 encoding is very verbose and doesn't allow good random access, which brings one to the next point:
Random access is difficult. Neither of the two traditional modes of reading an xml file - construct a DOM or forward-only SAX style reading allow for truly random access.
Concurrent write access to different parts of the file is difficult, which is why its use in Windows executable manifests is error prone.
What encoding does an xml file use? Strictly speaking you guess the encoding first, then read the file and verify the encoding was right.
It is difficult to version portions of a file. Therefore if you want to provide granular versioning, you need to split your data. This is not just a file format issue, but also due to the fact that tools generally provide per-file semantics - version control tools, sync tools like DropBox, etc.

bright 2010-08-27 02:08:20

I think you saved the best (that is, the worst) for last. I have yet to see a diff tool for XML that was worth figuring out how to use it.

Robert Rossney 2010-08-27 21:48:11

Answer 3

+1 A:

I think in general the reaction is simply because XML is overused.

However, if there is one word I hate about XML, with a passion, is namespaces. The lost productivity around namespace problems is horrific.

Yishai 2010-08-27 02:16:13

People who don't understand XML namespaces often have that reaction.

John Saunders 2010-08-27 02:20:50

@John Saunders: I agree with Yishai: even when you understand XML namespaces, you can have a lot of pain with it, since some APIs are very unfriendly and documentation - incomplete. A PHP implementation of XPath is a good example. Or at least was in 2008 when I used it the last time.

MainMa 2010-08-27 02:27:56

@MainMa: sounds like a good reason not to use PHP. I hope they've fixed it to implement standards from the previous decade.

John Saunders 2010-08-27 02:30:57

@John Saunders, it isn't for lack of understanding them. Consider this: http://stackoverflow.com/questions/3314292/i-just-upgraded-to-drools-5-and-the-xml-rules-will-not-load. Sure everyone gets namespaces wrong - the tools (PHP) the implementers (the drools team) etc, etc. Bottom line, horrific levels of lost productivity over an issue that isn't important for the 80% use case.

Yishai 2010-08-27 14:59:25

@Yishai: I would argue that changing the namespace without providing migration tools constitutes lack of understanding of namespaces.

John Saunders 2010-08-27 17:20:57

@John, but my point is that as someone choosing a format, you have to be concerned that tools you work with will fail on namespace issues because apparently, they are not well understood (no matter how well you understand them yourself).

Yishai 2010-08-27 19:54:38

@Yishai: I'm sure I'll be considered contrary when I say that this is not a new standard. The first version was in [1999](http://www.w3.org/TR/1999/REC-xml-names-19990114/). Any tools which do not support XML namespaces should be discarded, immediately, and publicly ridiculed. We cannot have multiple flavors of standards like XML, or XML will not be a standard. Any software that does not support this standard should be treated like a cockroach and stomped, or like a drug addict - don't enable it, just cut it off.

John Saunders 2010-08-27 20:09:21

I could not agree more. A tool that doesn't properly supports namespaces is as much a menace as a tool that assumes attributes are ordered.

Robert Rossney 2010-08-27 21:45:27

Answer 4

+3 A:

<xml>
    <noise>
        The
    </noise>
    <adjective>
        main
    </adjective>
    <noun>
        weakness
    </noun>
    <noise>
        of
    </noise>
    <subject>
        XML
    </subject>
    <noise>
        ,
    </noise>
    <whocares>
        in my opinion
    </whocares>
    <noise>
        ,
    </noise>
    <wildgeneralisation>
        is its verbosity
    </wildgeneralisation>
    <noise>
        .
    </noise>
</xml>

paxdiablo 2010-08-27 02:26:12

I'm increasingly convinced that the "M" in "XML" is for "madlib."

tadamson 2010-08-27 02:59:23

Answer 5

+1 A:

XML descends from SGML, the great-granddaddy of markup languages. The purpose of SGML and by extension XML is to annotate text. XML does this well and has a wide range of tools that increase its facility for a variety of applications.

The problem, as I see it, is that XML is frequently used, not to annotate text, but to represent structured data, which is a subtle but important difference. In practical terms, structured data needs to be concise for a variety of reasons. Performance is an obvious one, especially when bandwidth is limited. This is probably one of the main reasons why JSON is so popular for web applications. Concise data structure representation on the wire means better scalability.

Unfortunately, JSON is not very readable without extra whitespace padding, which is almost always omitted. On the other hand, if you have ever tried editing a large XML file using a command-line editor, it can be very awkward as well.

Personally, I find that YAML strikes a nice balance between the two extremes. Compare the following (copied from yaml.org with minor changes).

YAML:

invoice: 34843
  date: 2001-01-23
  billto: &id001
    given: Chris
    family: Dumars
    address:
      lines: |
        458 Walkman Dr.
        Suite #292
      city: Royal Oak
      state: MI
      postal: 48046
  shipto: *id001
  product:
  - sku: BL394D
    quantity: 4
    description: Basketball
    price: 450.00
  - sku: BL4438H
    quantity: 1
    description: Super Hoop
    price: 2392.00
  tax : 251.42
  total: 4443.52
  comments: >
    Late afternoon is best.
    Backup contact is Nancy
    Billsmer @ 338-4338.

XML:

<invoice>
   <number>34843</number>
   <date>2001-01-03</date>
   <billto id="id001">
      <given>Chris</given>
      <family>Dumars</family>
      <address>
        <lines>
          458 Walkman Dr.
          Suite #292
        </lines>
        <city>Royal Oak</city>
        <state>MI</state>
        <postal>48046</postal>
      </address>
   </billto>
   <shipto xref="id001" />
   <products>
      <product>
        <sku>BL394D</sku>
        <quantity>4</quantity>
        <description>Basketball</description>
        <price>450.00</price>
      </product>
      <product>
        <sku>BL4438</sku>
        <quantity>1</quantity>
        <description>Super Hoop</description>
        <price>2392.00</price>
      </product>
   </products>
   <tax>251.42</tax>
   <total>4443.52</total>
   <comments>
    Late afternoon is best. Backup contact is Nancy Billsmer @ 338-4338
   </comments>
</invoice>

They both represent the same data, but the YAML is over 30% smaller and arguably more readable. Which would you prefer to have to modify with a text editor? There are many libraries available to parse and emit YAML (i.e. snakeyaml for Java developers).

As with everything, the right tool for the right job is the best rule to follow.

C P1R8 2010-08-27 17:23:29

Answer 6

A:

My favorite nasty problem is with XML serialization formats that use attributes - like XAML.

This works:

<ListBox ItemsSource="{Binding Items}" SelectedItem="{Binding CurrentSelection}"/>

This doesn't:

<ListBox SelectedItem="{Binding CurrentSelection}" ItemsSource="{Binding Items}"/>

XAML deserialization assigns property values as they're read from the XML stream. So in the second example, when the SelectedItem property is assigned, the control's ItemsSource hasn't been set yet, and the SelectedItem property is being assigned to an item that yet know exists.

If you're using Visual Studio to create your XAML files, everything will be cool, because Visual Studio maintains the ordering of attributes. But modify your XAML in some XML tool that believes the XML recommendation when it says that the ordering of attributes is not significant, and boy are you in a world of hurt.

Robert Rossney 2010-08-27 22:19:32

@Robert: that's a bug in XAML, and you should complain to Microsoft. They _must_ understand that if they base a format on XML, that they must adhere to XML standards. Period. Order of attributes is not allowed to be significant.

John Saunders 2010-08-28 00:08:21

It's not really a bug. Bugs can be fixed. There's no real way to fix XAML (or any serialization format that uses attributes to represent properties) so that it can only be used to serialize properties whose order of assignment is insignificant. You could remove the ability to use attributes from XAML completely, but that would make XAML much, much less usable.

Robert Rossney 2010-08-28 17:05:36

@John, when it comes to XML violations from Microsoft, I wouldn't even put this is the top tier. How about generating XMLs and the corresponding XSD but having the XML be invalid according to their own XSD? This is the situation in .NET remoting.

Yishai 2010-08-29 01:56:31

@Robert: It's also not a best practice to have an object with properties that need to be initialized in a particular order. Perhaps that's the overriding bug here.

John Saunders 2010-08-29 02:07:28

@Yishai: .NET Remoting no longer matters. If you find a situation like this with WCF, then please report it.

John Saunders 2010-08-29 02:08:00

@John, if you have to integrate with a vendor that exposes it, it matters ...

Yishai 2010-08-29 02:48:29

@Yishai: i meant matters to Microsoft, in terms of reporting bugs.

John Saunders 2010-08-29 03:01:27

I agree in principle that it's not a best practice, John. But what reasonable alternative is there in the case of a class that has an `Items` collection and a `SelectedItem` property? Allowing `SelectedItem` to be set to something that's not in `Items` is very bad. I've stamped my foot and gotten cross, but that hasn't really advanced towards a solution.

Robert Rossney 2010-08-29 06:06:23

ansaurus

tags:

views:

answers:

What are the weaknesses of XML?

related questions