tags:

views:

380

answers:

11

I do alot of systems programming where my apps have no chance of being used to communicate over the web or viewed through a browser. But, there has been some push by management to use XML. For example, if I want to keep a time log I could use a text file like this:

command date time project
in 2008/09/23 08:00:00 PROJ1
change 2008/09/23 09:00:00 PROJ2
out 2008/09/23 12:00:00 PROJ2
in 2008/09/23 01:00:00 PROJ3
out 2008/09/23 05:00:00 PROJ3

The XML would look something like this:

<timelog> <timecommand cmd=in date=2008/09/23 time=8:00:00 proj=PROJ1/>
...
<timecommand cmd=out date=2008/09/23 time=5:00:00 proj=PROJ3/>
</timelog>

Some of the initial advantages of the text version that I see is that it is easily readable and parsable with regex. What are the advantages to using XML in this case?

+2  A: 

A couple of benefits come to mind:

  • It's easier to parse into other applications
  • It's easier to understand what the document holds at a glance
  • Makes it easier to pull data into a managerial dashboard
  • Makes the management happy with little pain for you

The downsides, as I see them:

  • Means changing existing code, probably unnecessarily
  • Possible slight performance degradation, depending on how you build the documents compared to how you build the current docs
  • It's XML for XML's sake, which is effin' stupid

And, to close, a quote intended as irony: XML is like violence. If it's not solving your problems, you're not using it enough

Danimal
Great quote. Source?
MusiGenesis
I have no idea -- I found it on a thread on SO, but it's subsequently been closed and I can't find it.
Danimal
A: 

It's easily parsable using regex and xml and xsl.

Truth be told, there's not really an "advantage" to using XML unless you're sending the data to another system.

Stephen Wrighton
+1  A: 

XML's main feature in a case like this is that XML can be validated & controlled. In the text version, how would you be able to programmatically verify that the file is properly formatted? XML is designed to create structured, valid documents, and the resulting benefit is a format is rigidly controlled, and reliably structured. Maintaining code that reads from XML nodes is also going to be a lot easier and more logically laid out than maintaining a series of regular expressions for reading text files.

Jay
A: 

XML is a meta-format, meaning it makes it easier to define a format for your data. This makes it easier for multiple programs, including ones by different companies, to read and write data in the same format. It's especially suitable as a description for complex, hierarchical data.

In the example you outline above, the data looks to be isolated records in a fixed format, with no structure or hierarchy - in which case I can see no advantage in using XML. However, the example may be unrepresentative - your other files may contain more structured data.

Marcus Downing
+1  A: 

If you use XML then, in some ways, the data would be more "portable". You'd essentially have parsers for your data available in most environments, so writing a tool to analyze the data might be easier. Also, if it's in XML then you can write an XSLT to transform it into various other formats, making it easier to read.

That said, if you switch to using XML, even a simple format like the example you gave, your log files are going to become a lot larger.

There are some options other than XML that you could use. Jeff's Angle Bracket Tax blog post talks about this a bit.

Really, what you should do is find out how these logs are going to be used, and then determine what format would make those usages the easiest to implement.

Herms
A: 

Is that an ongoing log file?

How are you ever going to write the to create a valid document? Or are you going to read it in, add the new entry, and write it out each time?

Log files are perfect candidates for well structured plain text lines that you simply append to.

JeeBee
In this example, it could be read in so stats could be collected or modified and then written out to file again.
dr_pepper
A: 

I most cases (not always), XML makes it easier to understand the data because all of the sudden you have that meta data around your asset describing what is there in front of you (human-readable).

XML is also very accessable. What I mean by that is, that - since you mentioned it - you don't want to use regular expressions on XML. There are tools like XPATH (XML Path Language) which make querying XML fun. No need to whip out something no one else can read when you can travers easily through XML using something like XPATH.

There are cases where XML does the opposite (in terms of readability) and sometimes XML is also overhead. It's not always the best choice when you exchange data between systems (e.g. take a look at something really light-weight like JSON). And this sort of exchange doesn't need to be on the web either.

Till
I disagree with understanding the data. If the data is simple (like the example) then I think it's easier to read/understand without the metadata, as there's less noise. If the data is complex, then yes, having the extra metadata is helpful.
Herms
Herms, did you read my entire answer? Last paragraph?
Till
A: 

Whilst using XML for data files would mean that your data can be self describing and perhaps better organised, the end result is often data files that are far larger than before.

Ask yourself, what are the files used for? Are they to be changed? If so, who's paying and who has budgeted for it?

I love XML in some cases, and in others I hate it!

Ray Hayes
A: 

In the case of systems batch programming like you are talking about, a major feature of xml is that it's supported almost everywhere. So you write a program to handle some data today using xml, and in 10 years when you need to overhaul that program and want to use a completely different platform, you xml data will still be well supported.

Joel Coehoorn
A: 

If your developing in .NET (especially .NET 3.5 with LINQ to XML) you'll write less code to read/write the XML than if you used just a plain text file. Plus, XML just makes it easier for any person down the line to read the file and know exactly what's in it and what it's for. And, don't worry about the XML taking up a little more disk space, disk space is cheap.

Chris Pietschmann
+1  A: 

There's absolutely nothing wrong with using text-based data formatting. It has been the de-facto standard for decades. Big huge mainframe financial systems still use it today. The benefits are that it's trivial to produce, trivial to consume and incredibly lightweight. And how about log files? Do you know any production platform that doesn't generate its log file in a delimited text format (web, app, db server)?

The downside of flat text files is that if the format changes, then you have to modify both the producer and the consumer ends non-trivially to be able to support the format change. Of course if it's just a human consuming the result, then you only have to change the producer.

The beauty of XML is that the parsing of the data is independent from not only the data but the format of the data. Logically you pass it both the data and the data format, and presto! Everything works. It's not exactly that simple, but that's the premise. You can change the format of the data, and your producers and consumers only have to change trivially (if at all).

The ugly of XML is that it can be a huge performance dog (SOAP anyone?) and very heavy weight. You definitely pay a price for its extensibility. There are cases where it is absolutely the optimized technical solution for a given problem domain, and there are other cases where it's not.

So if it's a simple log that a human will read, keep it flat file. If it's a simple app communicating with another single app and the communications will not change dramatically over time, flat file is definitely faster and lighter to implement, but XML is not a bad choice. If multiple apps need to consume the data you're providing or if the volume of communication change is going to be high, then go with XML. The maintenance of the interface will be more easily maintained over time if you do.

Ed Lucas