tags:

views:

11569

answers:

15

We're an XML shop (we use both XMPP and RSS/Atom feeds a lot, so I guess we have a little or no choice). Yet, I keep hearing about people who "hate" XML and sometimes refuse to use APIs who can only return XML in favor of JSON.

It looks like many many prefer JSON to it, but I'm still not so sure why. Of course, JSON is so much more lightweight and less verbose, but at the same time it's not easily extendable, and I'm not sure there is anything like SAX parsers for JSON, for example. Also, I'm not sure either JSON and XML are intended to be read by humans anyway.

I understand this question is too open, so maybe we could just list Pros and Cons of XML? Do not hesitate to also indicate when you think it is suitable to use XML, and when is it more suitable to use alternatives like JSON? (Thanks Muhammad Alkarouri)

Cons :

  • Verbosity (need for closing tags)
  • Too easy to mess-up with extendability (too liberal?)
  • Hard to read for humans
  • Slow (to be processed, I guess)
  • Low performance for binary data
  • Escaping
  • Not native in any language

Pros:

  • Easy to read when formatted correctly
  • Name-spacing and eXtendability
  • Flexibility

Everybody seems to agree that the problem with XML is that it's used too often when it isn't necessary.

+1  A: 

We don't. Who hates XML?

Though I can understand some frustration, not hate, from some JavaScript front-end developers when there is no JSON serialisation available as JSON is probably easier to handle off the bat without any need for XML DOM.

Delan Azabani
Your incessant downvoting is uncalled for and should stop immediately.
Delan Azabani
@Delan Feel free to contact me or anyone else regarding this matter. I hardly think I've downvoted "invessently". And I would expect that, by providing a reason for downvoting (as has often been promoted on the meta forums and similar) I'm following the general 'moral' guidelines for this site. No malice intended, but I won't waste time or space discussing it in these comments further.
Noon Silk
Silky, you're taking Stackoverflow too seriously. And before you start going on about how this voting system has helped to improve the quality of answers, read your/and other comments and see if your comments have helped anything.Please feel free to downvote me too.
Khash
@Khash: Check out the comments under this thread: http://goo.gl/CAxX
missingfaktor
@Khash: He can't downvote you; comments (unlike answers) can only be upvoted (or flagged, but what you've written doesn't merit that).
Donal Fellows
+15  A: 

Half of it is prejudice. The other half is that it's too verbose for most applications it's used for, when a more restricted format (e.g. JSON or YAML) would be fine.

Ignacio Vazquez-Abrams
I also downvoted you. The first part of your comment is argumentative and baseless.
Noon Silk
Your incessant downvoting is uncalled for and should stop immediately.
Delan Azabani
I wrote it when the question was argumentative. Baseless? Why do you think acronyms like NIH even *exist*?
Ignacio Vazquez-Abrams
+1 from me, specifically for the "Half of it is prejudice". The definition of prejudice is `an unfavorable opinion or feeling formed beforehand or without knowledge, thought, or reason.`, and that applies beautifully to the "I hate XML, but have no good explanation why" crowd. It's unfortunate that the word prejudice has such strong racial/sexual orientation connotations attached to it now that any use of the word itself tends to have prejudices associated with it. It's almost a self-referential word! =)
Rob
@Delan: That's pretty amusing. @Rob: So you can confidently speak for all programmers and therefor show how they have no basis to supposedly hate XML? No, you can't. You know your English, but apparently not much logic. Anyway, I won't comment further, I was merely - in good faith - explaining my downvote. I have no interest in a discussion.
Noon Silk
You might find my irritation at your careless downvoting, but I don't. Please stop this.
Delan Azabani
@silky, I've not claimed to speak for anyone, merely put forthy my opinion, which I'm more than entitled to do, plus I don't see *anyone* claiming to speak for all programmers! =) There's no need to be so agressive and argumentative, accept the question and its answers in the good faith that they were intended =)
Rob
@Delan: You're being ridiculous. If you have a complaint against me file it on the meta website or with a moderator or similar approach. Downvoting is an implemented function on this site and I'm exercising it. You shouldn't place much value on your arbitrary number anyway. A downvote is a disagreement, and disagreements are useful in any environment. @Rob: Ignacio is speaking on behalf of everyone. You are agreeing with him. I am saying that neither you nor he nor anyone can know, or every confirm, that a group of people all 'hate' something, without a proper survey. No agression on my part.
Noon Silk
@Delan, @Rob: Check out the comments under this post: http://goo.gl/CAxX
missingfaktor
@Missing, just had a scan of them. *sigh*, there's always one.
Rob
+9  A: 

Some people don't like the verbosity of XML (angle brackets, the need for closing tags).

Oded
If they don't like angle brackets, why do they still use XML? why not JSON?
KMan
@KMan - I imagine they do.
Oded
+32  A: 

XML doesn't really begin to shine until you start mixing together different namespaced schemas. Then you see JSON start to fall down, but if you just need a serialization format for your data, JSON is smaller, lighterweight, more human readable, and generally faster than XML.

Ref : This

DMin
I wouldn't phrase it as JSON "falling down". That implies that JSON fails at something it was meant to perform. But JSON is not a document markup language. It's a serialization format for arbitrary data structures only. It was never meant for prescribing document specifications or semantic structure. It's sorta like saying that _the abacus falls flat when trying to render 3D graphics_. Sure an abacus and computer both perform math calculations and have a slight overlap in application, but an abacus was never meant to be used as a computer.
Lèse majesté
@Lese exactly, they're good for different things.
DMin
XML Namespaces are just a cruel joke. They don't actually solve any problems, they just make parsing unnecessarily difficult.
Jonathan Allen
Incorrect usage then. JSON is typically rest or resource based, so namespacing is done via the url. There is JSON-Schema for validation as well. Also typically JSON is put into basic computer science types such as lists, dictionaries, strings, numeric, dates and bools. One can create any complex type from those. There is also BSON for binary friendly apps.
Ryan Christensen
@Jonathan, I don't agree with you. Things like XHTML with embedded SVG or even XSLT where the namespace makes the difference between "code" and "data" really show the benefits of namespaces. Of course, this is about markup, and not about data serialization. Each format has its own uses and both are very valuable tools; XML without namespaces would not be half as valuable as it is today.
Lucero
+16  A: 

I don't hate XML and I don't find it frustrating. I sure as hell hate people that try to solve every problem in the world with XML. That frustrates me... a lot.

XML shines as an extensible way to represent data that can easily be read by many different systems. And that's it. When I see it being used for configuration files, UI layout representation (XAML anyone?) or plain code (workflow descriptions such as Microsoft's WF, build tools such as ant, etc), I wish I had a machine gun.

Hristo Deshev
+52  A: 

"XML is like violence. If it doesn't solve your problem, you're not using enough of it."
- Unknown | Chris Maden

Links of interest for this quote:

sushant
What a strange answer!
Rohan West
+1 for excellent sarcasm!
Niklas
How about attributing the quote?
Lucas
Original author of the quote: http://www.pluralsight-training.net/community/blogs/tewald/archive/2005/03/09/6616.aspx
xk0der
I wish I could give you 100 upvotes!
Mike Akers
+1  A: 

If you spend a lot of time dealing with Javascript objects client-side, receiving JSON means skipping one conversion step and one set of escaping/encoding issues. That's a big plus. But if you store data in XML, I can see why it'd be less appealing.

The human readability thing is a red herring. Both formats can be made readable at some cost to their efficiency but it always depends on the human. Neither format is readable or (safely) editable by most non-developers but there are currently more tools around to let our less nerdy brethren work with XML. So even if you've started to develop angle bracket intolerance, you'll be dealing with them for many years.

Odilon Redo
+64  A: 

Leaving aside the obvious subjective arguments (its verbose, the parsers are complex and slow, the spec is enormous, the data-to-markup ratio is bad) I'll talk about the data model of XML vs JSON.

First, the data model matches the data. Key/value pairs (hashes), lists and scalars are the primary means of organization. They're natural data structures that humans work well with. Look at the world around you, you will find them everywhere. Consider a simple form; a grocery list; describing yourself; listing things you did today; or email in your inbox. XML likes trees, and this is awkward. You can model everything as a tree, and from a computer science point of view its a very flexible structure, but it doesn't come naturally. Things tend to have some hierarchical structure, but not as the primary means of organization.

Second, the data model matches the native data structures of dynamic programming languages. I cannot stress this enough. Reading a JSON document into Perl or Ruby or Python or Javascript is trivial because it maps directly to native variables. You just slurp it in. XML needs to be transformed, because most languages don't do trees and graphs well. And you have to decide how you're going to handle attributes vs inner tags. Will you get <person eyes="blue">Joe</person> or <person><eyes>blue</eyes><name>Joe</name></person>? Each organization is going to have its own way of doing it with its own tags and its own idiosyncrasies which means extra coding for the developers. Look at the abomination that are Apple plists for an example.

JSON, by being so simple and having a data model which matches the nature of the data its representing, can only represent given data in a few plausible ways. So you don't need a schema, you can eyeball it. And because its data model matches the language's native data model, transforming data from a JSON document is as simple as transforming any other data. No tree manipulation required.

Now JSON is awfully limited, and there's a lot it cannot do. But that's ok, it is naturally constrained by its limitations. You can't make JSON do what it cannot do (not without a lot of work). XML is the opposite. It is generic and unconstrained. You can take almost any data problem and apply XML to it. Configuration files? Log files? Email? Project build files? Sure! Throw it all into XML! Add in that for a while XML was the only serious, generic data format out there, and it was being oversold as the do everything format, and you have a generation of technologies built with XML that are overkill.

XML was introduced when I started programming. It was hailed as the great savior of data interchange! No longer would you need to write custom parsers for whatever made-up data formats your business partners were using, just pass around XML documents! Its generic! You only need one parser! Which was a great leap forward, but it wasn't the silver bullet it was hyped to be. You still needed to interpret the data, which meant (if you were lucky) a schema file and transforming that awkward tree structure into something you could actually use. You wind up with a towering framework of XML technologies.

A lot of companies bought the hype, wrote XML documents any old way they liked and expected their data problems to just vanish. It didn't happen.

This answer started rational and naturally became frustrated. I'll try to sum up. XML has an awkward data model, its often overkill and it was oversold. If you need typed, hierarchical data, hopefully machine generated and consumed, its great! Usually you don't and so its mismatched to the task.

That said, I prefer YAML.

Schwern
XML is nicer than ASN.1, the other candidate for interoperably dealing with hierarchical namespaced data. Both JSON and YAML are much more constrained in what they can do, which is fine if your problem is inside their domain (hierarchic non-namespaced data), but sucky otherwise.
Donal Fellows
+1, both for a thorough answer and for the nod to YAML.
Daniel Pryden
@Donal Yes, when XML was introduced and folks said "XML is simple!" there was an unspoken "...compared to SGML" which isn't a glowing endorsement. I've never had to work with ASN.1, and from the looks of it I hope I never do, but I guess the same applies.
Schwern
+1 for a confusing yet elegant statement: "is naturally constrained by its limitations"
el chief
+1 for YAML! Its far superior to JSON and XML- if only it was better language supported.
crgwbr
@Schwern: Thankfully ASN.1 is not used that much except in things relating to X.500; LDAP and SSL (especially X.509 certificates) are the things which you're most likely to encounter and they hide most of the nastiness away from you. XML steamrollered ASN.1 in most other places, and thank goodness (it's a standardized *binary* serialization format; compact, but unreadable without tools).
Donal Fellows
@crgwbr YAML does a lot more than JSON -- sometimes that's good, and sometimes it's a major pain, because it's a lot less universal. Custom data types make it harder to use a YAML file outside of a single language (or even a single library) that produced it, and just as a personal note, the multiline array and hash formats are painful to edit manually. There's something to be said for restricting yourself to a very simple, unexpressive format, as long as it works.
hobbs
Properly handling attributes vs inner tag in XML can be a pain, but I have found that if you follow those simple rules it becomes much more easier :IF (There can be several instance of the element) OR (The data is longer than then enclosing tags) THENuse inner tagELSEuse attributeOf course this doesn't solve the problem when you receive data from outside your scope of influence. But how hard would it be to implement a validator that rejects any XML that doesn't follow those rules ?
SAKIROGLU Koray
Interestingly, no one has mentioned an older technology for information interchange that comes with a built-in interpreter to massage the data as needed: Lua.But maybe I shouldn't open up that can of worms - the Lisp folks will surely chime in with how S-expressions were both earlier, and more flexible....
BMeph
@SAKIROGLU Any plan that starts with "if everyone just followed policy X" is doomed to failure when the groups are all independent. Everything is easier when you can walk over and hit somebody on the head, but generic data interchange is about being able to fling data around without having the parties involved negotiate the details.@BMeph If by "built in" you're suggesting shipping data + a binary executable to manipulate it, that sounds like a recipe for incompatibility and Trojans. If you just mean use Lua data syntax; doesn't look like it buys much over JSON.
Schwern
I agree with description, up to but not including YAML. This because YAML seems to have fallen victim of "feeping creaturism" -- whereas JSON has slightly less functionality than it ought to (why the heck were comments removed, for example), YAML has kitchen sink and all, total overkill. This causes issues with interoperability, performance, and overall limited set of tools capable (or willing) to use it -- at least for "simple" use cases.
StaxMan
@StaxMan Comments were removed from JSON because they were the only major syntax keeping JSON from being a subset of YAML (minor discrepancies have been found since) with YAML 1.2 being written to codify this. I agree that losing comments was a steep price to pay for this interoperability.
Schwern
@Schwern Ah, I did not know the actual reason. Thank you for sharing this. And yes, I agree it was a steep price.
StaxMan
+2  A: 

One advantage of XML over JSON is schema support (DTD is part of the original XML spec, there's XML Schemas, and alternatives like Relax NG and Schematron).

However, this advantage is diminishing in importance, as schemaless approaches gain in popularity, including scripting languages (Python, Ruby, PHP, Javascript itself etc) and agile development. NoSQL is another sign of this change. They offer faster development and greater flexibility.

Although schemas, being a kind of type, tend to be faster (because type information can be used to optimize processing; eg. Java is faster than Javascript) and more reliable (because of type checking), these advantages are decreasing in importance as processing power becomes ever cheaper, and in webapps, it is more plausible to fix bugs as they arise (instead of trying to get everything right in the first place.)

These factors are strongest in web development, which also naturally gets the most publicity on the web; though it seems that they would also apply to programming in general.

And of course, JSON is the most natural representation for Javascript apps, and Javascript is arguably the most popular language in the world, and growing.

13ren
JSON is very nearly a natural representation for Python too: The only differences are `true` vs. `True`, `false` vs. `False`, and `null` vs. `None`.
dan04
+11  A: 

Not native in any language

That's incorrect. Visual Basic .NET has builtin XML literal support, as has Scala.

Dario
So does AS3. However, in our internal testing, JSON decode was still faster than whatever behind the scenes magic was going on to make XML "native".
Jay Paroline
So does Javascript with E4X
Tomas
At one point Haskell did as well, that is where VB got the idea.
Jonathan Allen
+3  A: 

XMLs greatest strength is that it is well supported on every platform and that you can use namespaces to embed formats into each other.

Another strength of XML is that you can use Schemas to define the structure in detail and verify if a document is correct.

For JSON there's also JSON Schema but I had no time to dive into it.

JSON has the strength that it is smaller and with a less verbose structure. Also its more usable when direct communication with JavaScript in a web environment is required.

Finally in most current platforms there are viable solutions to automatically (de)serialize data structures to both formats.

Johannes Wachter
+26  A: 

The problem with XML is that people forget that it stands for "Extensible Markup Language" and not "Extensible Data Serialization Language".

For something like XHTML, XML does a reasonable job. The data is mostly text, so it's not a problem that the only data type is text. The tag-to-text ratio is low, so it's not a problem that you have to write each tag twice. In fact, that's a huge readability advantage in cases like <script>...</script> because you might not see both tags on the screen at once. It also makes sense to have attributes, because in <div id="answer-12345678" class="answer">...</div>, the distinction between "data" (text element) and "metadata" (attributes) is clear: "Data" is displayed to the user and "metadata" is for formatting and navigation information.

But in a data serialization language, <UserID>287586</UserID> is needless verbosity: The tags are longer than the data! And the content/attribute distinction is also redundant; What's the difference between <name first="John" last="Doe" /> and <name><first>John</first><last>Doe</last></name>?

XML's lack of data types is also a problem. Sure, you could use a convention like writing <UserID type="int">287586</UserID> or having an external means (such as XML Schema) of declaring the type of each element, but it's far more complicated than having 287586 be an integer and "287586" be a string.

The JSON type system maps so well to programming languages: Nearly every language has strings, numbers, bools, and null. And most dynamically-type languages have types to represent "array" and "object" (map of strings to objects).

Python's json module defines no data types that aren't part of the parser/encoder itself: Everything is represented by the built-in types int, float, bool, list, dict, and None. In contrast, xml.dom.minidom has to define the types Document, Node, Element, CDATASection, etc. to represent XML-specific concepts.

dan04
Technically it's accurate to say that XML itself lacks data types, but actually what it does is to defer this responsibility to schemas, which have the potential to impose much more powerful typing than what JSON provides. For example, some may find it nice to be able to say that such-and-such element is storing an ISO 8601 date, which XML Schema allows you to do with its built-in date type. In conjunction with schemas, XML can actually make a decent strongly-typed object serialization format.
alexantd
Very excellent argument, being able to maintain type is probably one of the most overlooked features of JSON when comparing it to XML. Being able to test the type of a JSON value means that "duck typing" is far more reliable than with XML.
shadowhand
+2  A: 

I don't hate XML, but I don't like it much. Dealing with XML is awkward. The DOM is the reason for a lot of "XMLUtils" classes out there.

I'm a bit less frustrated when I can use something like pythons ElementTree for parsing and I'm a lot less frustrated when I don't have to use XML at all and instead can use something like JSON (especially in dynamic languages)

Mattias Nilsson
+6  A: 

Some people, when confronted with a problem, think "I know, I'll use XML." Now they have two problems.

srparish
Originally a quote by Jamie Zawinski about s/XML/regular expressions/.
glebd
...and yet, it still entertains! +1 for you, sir! :)
BMeph
A: 

The benefit that I see with json or yaml over xml is that it maps to data types. An array looks only one way as does a Hash

XML is designed and created by a person and another person needs to write a parser for it to map back to their objects.

So everyone has a preference for how to represent things and some of those make it easy to parse and others don't.

Pete Brumm
JSON doesn't have a date data type so I still have to write my own parser.
Jonathan Allen
true, I usually store dates in a standard format "2010-08-29" and Time objects as int's.
Pete Brumm