tags:

views:

804

answers:

12

Ok, I've read a couple books on XML and wrote programs to spit it out and what not. But here's the question. Both a comma delimited file and a XML file are "human readable." But in general, the comma delimited file is much easier on my eyes than a XML file; the tags typically take up as much if not more space than the data. This just seems to obscure what I'm reading and the format can take a page to contain the same information that you can contain on a single line of text in a comma delimited file. And a comma delimited file is significantly less complex to parse. So the real question is why XML? Just because all the cool kids are doing it?

+4  A: 

It all depends on what you need to do. If you need more complexity in your data structures than a simple "flat" row structure can give. for example hierarchical data, then XML is a great choice.

Joel Martinez
+6  A: 

XML supports complex, structured and hierarchical representation of things. That's far from what CSV can store trivially.

Think about a complex object graph in an object oriented environment. It can be serialized as an XML document pretty easily but CSV cannot handle such a thing.

Mehrdad Afshari
Ok, I'll give hierchical vs CSV. But if I'm thinking about a complex object oriented enviroment, a C++ or Java like syntax for data representation is much lighter weight. I've actually thought of writting a "C-Structure" style data parser because the syntax is so much cleaner.
NoMoreZealots
+2  A: 

CSV was never really a standard. Just the same quick and dirty method a bunch of people came up with independently. Of course, some of these people were smarter than others and realized you needed to escape characters but others didn't. Even MSSQL exports CSVs improperly. There is a documented RIGHT way to doing XML so if you're doing it right and someone's application or whatever isn't accepting it you have some clout when you say "That's not my fault."

Spencer Ruport
good example: how do you deal with data containing a comma in a CSV? XML has a documented RIGHT way of dealing with cases like this.
russau
CSV is a standard: http://www.rfc-editor.org/rfc/rfc4180.txt
pmf
That's not really a reason to use XML, though.
musicfreak
+1  A: 

Xml can be validated against a contract (schema or DTD).

Kris Krause
+1  A: 

XML also has complimentary technologies surrounding it: XmlDom, XPath, XSLT, XSD, Xml Schemas

russau
+9  A: 

Advantages

There are a number of advantages XML has over CSV. A few of them include:

  • Hierarchtical data
  • Automatic data checking (XML Schemas or DTDs)
  • Easily convert formats (using XSL)
  • Easy to identify relational structure
  • Can be used in combination with XML-RPC
  • Suitable for object persistence
  • Simplifies business-to-business communications
  • Helpful related technologies (XPath, DOM)
  • Tight integration with modern Web browsers

It completely depends on the problem domain and what you are trying to solve.

Example

The last item is something that many people miss when writing web pages. Consider the situation where you have a large data store of songs. Songs have artists, albums, beats per minute, and so forth. You could export the data to XML, write a simple stylesheet to render the XML as XHTML, then point the browser at the XML page. The browser will render the XML as a web page.

You cannot do that with CSV.

Disadvantages

Joel Spolsky has a great article on why XML is a poor choice as a complex data store: it is slow. (Unlike a database, which can retrieve previous or next records with a single CPU instruction, traversing records in an XML document is much slower.) Arguably, this could be considered an optimization problem, resolved by waiting 18 months.

Related Question

See also: Why Should I Use A Human Readable File Format.

Dave Jarvis
+1 exactly, there's a whole ecosystem of tools and specifications around XML. Another one: XML digital signatures gives you a standard way to authenticate data. http://www.w3.org/Signature/
Wim Coenen
+6  A: 

These aren't the only two options, you can also use JSON or YAML which are much lighter weight than xml.

In general, if you have simple tabular data with out many special characters, CSV isn't a bad choice. For structured data, consider using one of the other 3.

Dana the Sane
+1: Many people forget that there are formats besides XML that do almost the exact same thing. I've never really worked with YAML but JSON is a great "lightweight" alternative to XML (not to mention it's easier to parse in most programming languages).
musicfreak
I like this answer, it actually takes into consiteration alternatives.
NoMoreZealots
Oh,geeze,that's nice I looked up some YAML and JSON. And that REALLY gives me my answer. There are definitely better non-propriety formats than XML.
NoMoreZealots
For many cases, JSON is definitely better to work with than XML. Where XML gains traction here is when working with standardized schemas, and when integrating schemas together (namespaces are one honking great idea!). If you don't need any of that, and particularly if you're creating an ad-hoc format for your own needs, go with JSON or YAML.
jcdyer
+3  A: 

Well XML is human readable and human editable. You can look at an XML file and know exactly what it is. A CSV file is human readable but you don't really know what each value means at all.

For example, if we're storing user accounts, which would you prefer?

<user>
    <username>ryeguy</username>
    <password>abc123</password>
    <regdate>3-4-08</regdate>
    <email>[email protected]</email>
</user>

OR

ryeguy,abc123,3-4-08,[email protected]

Of course, this is just an example, but imagine it with 30 fields or so!

Or worse yet, what if we make subfields?

<user>
    <username>ryeguy</username>
    <password>abc123</password>
    <regdate>3-4-08</regdate>
    <email>[email protected]</email>
    <posts>
        <post>
            <id>34</id>
            ....
        </post>
    </posts>
</user>

That would be a pain in the ass to put in a CSV. Soon you'd be making your own querying language.

ryeguy
I don't know, the file format actually takes up more space than the actual DATA. DATA i.e. the stuff you actually need to KNOW! If I'm doing from a program instead of by hand, then well "<SometagThatsWayTooLong>data</SometagThatsWayTooLong>" is just more stuff I have to clog my HD with and waste clock cycles on and for large files it's not REALLY readable anyway.
NoMoreZealots
You probably want a header row like, "username,password,regdate,email" as the first line, then, if you really can't remember your fields.
erjiang
+3  A: 

The fact that XML is human readable does not mean that has been made with the idea of having it read (or even edited) directly by humans.

XML has a nice set of properties that make it a good choice for many cases, in particular when you have the human resources to deal with the additional burden that such properties inevitably bring in: validation, well defined standard, a lot of tools, a very flexible architecture, it maps nicely to a tree model, which is what many programs use. Its human readability is an added value that simplifies debugging (try to do debugging of a binary file...), inspection and small changes for trivial cases.

CSV on the other hand is easy, quick and linear, although many dialects exist, and parsing it well is far from trivial (and with the added problem that it looks trivial!). For most applications involving table of data, CSV is the perfect choice.

In general, however, there are cases of data representation you can solve with XML but you cannot solve with CSV (for example, a tree). On the other hand, any data that can be represented in CSV can also be represented in XML, although it's not guaranteed (and indeed is also verified) that it will be more efficient (in terms of space, ease of parsing etc). It's a matter of "degrees of freedom" of your format. XML has a higher value of degree of freedom. CSV is lower. The hype behind XML is also relative to this fact.

Don't fall victim of the hammer syndrome: when you have a hammer (XML), everything looks like a nail (something that you have to solve with XML). Reality is much different and nuanced. XML is cool, but it's not the answer to any problem.

Stefano Borini
I like the hammer comment. <User><FirstName>Bob</FirstName> <lastName>Fett</lastName><UID>100</UID></User> just seems, well dumb compared to Bob,Fett,100.
NoMoreZealots
+1  A: 

Among the reasons you may prefer XML over CSV (depends on the task at hand of course): * Almost all platforms and languages have existing libraries for reading, writing, parsing, and manipulating XML. * XML has well-defined rules for encoding all characters. CSV has ambiguities such as how to encode commas that are part of the data. * XML supports a variety of data shapes (like hierarchical) where as CSV is most useful when the data looks like a table (rows and columns).

C. Dragon 76
+2  A: 

XML will describe the content and also has a ton of supporting libraries in a variety of languages... but it can be bloated. If the receiving end of the csv is aware of the layout and it is tabular, I don't see anything wrong with it.

Captain Insano
+1  A: 

I like to think of the primary distinction in this case as XML is TREE based, while CSV is TABLE-based.

That is, you can nest and re-nest and omit and generally make a complex TREE structure in XML, whereas you can only make simple 2D tables in CSV.

erjiang