views:

210

answers:

6

What problems was XML invented to solve? From what I can tell, it seems like it specifies a uniform syntax for things that may have vastly different semantics. Unlike, for example, an HTML file, a Java source file, or a .docx document, one cannot write a program to extract any kind of high-level meaning from an XML file without lots of additional information. What is the value of having the syntax rigidly specified by some standards committee even when the semantic meaning is completely unspecified? What advantages does XML have over just rolling your own ad-hoc format that does exactly what you need and nothing more? In short, what does XML accomplish and why is it so widely used?

+10  A: 

XML forces your data to be well-structured, so that a program which does not understand the semantics of your data will still be able to understand its syntax. This allows things like XSLT, which will transform one well-formed XML document into another. It means that you can manipulate data without having to interpret it. You can see the document is well-formed and valid according to its DTD without needing to understand the contents.

This was a huge step forward for data storage, interoperability, and machine-readability in general.

Borealid
@Borealid: But how much manipulating can you really do w/o any clue about semantic meaning?
dsimcha
@dsimcha: Unlimited manipulation. Read up on XSLT - an XSLT interpreter knows nothing about semantics. The basic idea of XML is that your data are just that - data. Inert.
Borealid
@dsimcha: I don't think you even know what you mean by "semantic" meaning.
Noon Silk
@dsimcha: You could say the same thing about TCP : why use the same protocol for messages with very different semantics? There are advantages to using a common syntax for lower levels.
Steven D. Majewski
+1  A: 

Ad hoc solutions work fine within the confines of your own system, but when you need the ability to communicate with 1...N other systems, it's a good foundation that all parties can rely on to work at a minimum in a certain way. Yes, the data has no semantic meaning, but you're assured that the TRANSFER and CONVERSION of data will still be successful. There's many more reasons, but that's one of the most important I've always thought.

This is a very primitive example but think of when systems used to communicate with flatfile data. You could have had a string that other parties had built communication around such as AAABBBCCCDDD. Other systems knew that they would get AAA "data" in the first 3 characters etc... Now someone changes something on your side and accidentally starts sending BBB AAA CCC DDD. Boom, everything is broken.

With XML you could have both:

<xml>
  <a>AAA</a>
  <b>BBB</b>
  <c>CCC</c>
  <d>DDD</d>
</xml>

AND

<xml>
  <b>BBB</b>
  <a>AAA</a>
  <c>CCC</c>
  <d>DDD</d>
</xml>

without breaking someone elses system.

jaywon
+6  A: 

I personally find XML to be useful because I find writing parsers to be a pain. If you invent your own data format that is what you wind up spending a lot of your time writing parsing code - checking for correct input in what could be a lot of user data. Then after you get all the input and validity checking code completed for your parser, you then have the joy of developing documentation for your file format for anyone else who wants to use it, plus the further joy of finding bugs in your input validation code for your parser after they start sending data your way.

With XML the parsing mechanics are well defined, and with XML schema or DTDs you can specify the formats you are willing to accept. XML parsers are available for almost every major programming language, so you the amount of code you have to write, maintain, and document is greatly reduced.

Being able to use XPath to access parts of the document is also useful
barrowc
+1 @barrowc. Please add XPath as a separate answer. It is one of the most compelling Xml stories
flybywire
+2  A: 

xml lets you be non-standard in a standard way :). It's ugly, it's verbose, it takes up a lot of space and it's absolutely invaluable for interoperability. Basically, xml is nice because it gives you a standard way of describing your data so that a single type of parser can handle data from disparate sources.

To use a more concrete example, I used to work in the semiconductor tool industry in the days before xml. Each tool used a recipe to describe how to process a particular wafer. Every one of those tools used a different format for their recipes. Now, pity the poor person (me!) who had to take several of those tools and integrate them into a single processing system. I had to write a different parser for each recipe type, convert recipes from a common store into the format appropriate for a particular tool, it was just a nightmare. If xml had been available, all those recipes could have been defined via xml and any conversions or transformations handled with simple xlst scripts. It would have saved me literally months of development effort just for that portion of the integration code.

Jim Nutt
+1  A: 

The answer is in your own question. "From what I can tell, it seems like it specifies a uniform syntax for things that may have vastly different semantics." Having a uniform syntax solves part of the problem for things that have vastly different semantics, and it's not a trivial problem in the slightest.

Similarly, text-encoding is used in markup (including XML), computer programs, writing human-readable documents and many more tasks with vastly different semantics. Would you like to reinvent Unicode every single time? Would you even know enough about all the issues to have a chance of doing so (or even a chance of re-inventing a passable ASCII?, ASCII only seems simple these days because so many of the complicated features of its control codes are no longer used, old school ASCII uses are often way more complicated than Unicode).

Numbers are used all over the place in computing, and we still have four different internal syntaxes in use (two endian styles, two complement styles) though the details are generally hidden these days.

As well as doing one chunk of the work of the creator of the format for them, and demonstrating one chunk of the work for the producer or consumer is one they are already familiar with (and hence may already have tools for), it completely eliminates one chunk of the work for a producer-consumer who is reading in one format and writing in another.

Jon Hanna
A: 

I'm interested in this thread because I have been struggling with exactly the same question. I have read quite a bit about XML but nothing has yet convinced me of the point of it. If I put the question "What is the point of XML" into Google I get 108,000,000 replies which suggests that the question has some validity. Nothing that I read really answered the question.

Main issue for me: if you were using an XML document as a flat file database I could see some point but if you are using a MySql database, why output data to an XML file first rather than straight to the HTML page?

Secondary issue: the tag specificity provided by XML could surely be achieved just as easily using nested HTML unordered lists and putting in classes to identify tags where necessary. If you argue that it could make an HTML page unmanageably large, then put the list in another document and make it a PHP include. That would surely mean less typing as well as the simplicity of adhering to a well understood format.

Can anyone convince me I'm wrong?

Mike