Why do so many projects use XML as a configuration file language?
XML is a well developed and adopted standard, making it easier to read and understand than proprietary configuration formats.
Also, it's worth understanding that XML serialization is a common tool available in most languages that makes saving object data extremely easy for developers. Why build your own way of saving a hierarchy of complex data when someone else has already done the work for you?
.NET: http://msdn.microsoft.com/en-us/library/system.xml.serialization.aspx
PHP: http://us.php.net/serialize
Python: http://docs.python.org/library/pickle.html
Java: http://java.sun.com/developer/technicalArticles/Programming/serialization/
Because parsing XML is relatively easy, and if your schema is clearly specified, any utility can read and write information easily into it.
Because XML sounds cool and enterprisey.
Edit: I didn't realize my answer was so vague, until a commenter requested the definition of enterprisey. Citing Wikipedia:
[...] the term "enterprisey" is intended to go beyond the concern of "overkill for smaller organizations", to imply the software is overly complex even for large organizations and simpler, proven solutions are available.
My point is that XML is a buzzword and as such is being overused. Despite other opinions, XML is not easy to parse (just look at libxml2, its gzipped source package is currently over 3MB). Due to the amount of redundancy it is also annoying to write by hand. For example, Wikipedia lists XML configuration as one of the reasons for the decrease of the popularity of jabberd
in favor of other implementations.
One other point, if you have an XSD (schema file) to describe your configuration file, it is trivial for your application to validate the configuration file.
- XML is easy to parse. There are several popular, lightweight, featureful, and/or free XML parsing libraries avaliable in most languages.
- XML is easy to read. It is a very human-readable markup language, so it's easy for humans to write as well as for computers to write.
- XML is well specified. Everyone and his dog knows how to write decent XML, so there's no confusion about the syntax.
- XML is popular. Somewhere along the way, some Important People™ started pushing the idea that XML was the "future", and a lot of people bought it.
As a side note, I'm not trying to defend XML. It has its uses, and I will be using it in a project whenever I get back to that. In many cases, though, and especially configuration files, the only advantage it has is that it's a standardized format, and I think this is far outweighed by numerous disadvantages (i.e. it's too verbose). However, my personal preferences don't matter - I was merely answering why some people might choose to use XML as a configuration file format. I personally never will.
Well.., XML is a general-purpose specification that can hold descriptions, nested information and data about something. And there are many APIs and softwares that can parse it and read it.
So it's much easy to describe something in formal way that is known cross platforms and applications.
Its because XML allows you to basically make your own semantic markup, which can be read by a parser built in virtually any language. An added benefit is that the configuration file written in XML can be used on projects where you are using two or more languages. IF you were to make a configuration file where everything was defined as variables for a specific language, it would only work in that language, obviously.
This is an important question.
Most alternatives (JSON, YAML, INI files) are easier to parse than XML.
Also, in languages like Python -- where everything is source -- it's easier to simply put your configuration in a clearly-labeled Python module.
Yet, some people will say that XML has some advantage over JSON or Python.
What's important about XML is that the "universality" of XML syntax doesn't really apply much when writing a configuration file that's specific to an application. Since portability of a configuration file doesn't matter, some Python folks write their configuration files in Python.
Edit
Security of a configuration file does not matter. The "configuring a Python program in Python is a security risk" argument seems to ignore the fact that Python is already installed and running as source. Why work up a complex hack in a configuration file when you have the source? Just hack the source.
I've heard folks say that "someone" could hack your app via the configuration file. Who's this "someone"? The sysadmin? The DBA? The developer? THere aren't a lot of mysterious "someone"s with access to the configuration files.
And anyone who could hack up the Python configuration file for nefarious purposes could probably install keyloggers, fake certificates or other more serious threats.
Hi Guys, Thanks for your answers. This question, as naive as it may seem at first glance was not so naive :)
Personally I don't like XML for configuration files, I think it's hard for people to read and change, and it's hard for computers to parse because it's so generic and powerful.
INI files or Java propery files are fine for only the most basic applications that does require nesting. common solutions to add nesting to those formats look like:
level1.key1=value
level1.key2=value
level2.key1=value
not a pretty sight, a lot of redundancy and hard to move things between nodes.
JSON is not a bad language, but it's designed to be easy for computers to parse (it's valid JavaScript), so it's not wildly used for configuration files.
JSON looks like this:
{"menu": {
"id": "file",
"value": "File",
"popup": {
"menuitem": [
{"value": "New", "onclick": "CreateNewDoc()"},
{"value": "Open", "onclick": "OpenDoc()"},
{"value": "Close", "onclick": "CloseDoc()"}
]
}
}}
In my opinion, it's too cluttered with commas and quotes.
YAML is good for configuration files, here is a sample:
invoice: 34843
date : 2001-01-23
bill-to: &id001
given : Chris
family : Dumars
however, I don't like its syntax too much, and I think that using the whitespace to define scopes make things a bit fragile (think pasting a block to a different nesting level).
A few days ago I started to write my own language for configuration file, I dubbed it Swush.
Here are a few sample: as a simple key-value pairs:
key:value
key:value2
key1:value3
or as a more complex and commented
server{
connector{
protocol : http // HTTP or BlahTP
port : 8080 # server port
host : localhost /* server host name*/
}
log{
output{
file : /var/log/server.log
format : %t%s
}
}
}
Swush supports strings in the simple form above, or in quotes - which allows whitespaces and even newlines inside strings. I am going to add arrays soon, somethings like:
name [1 2 b c "Delta force"]
There is a Java implementation, but more implementations are welcome. :). check the site for more information (I covered most of it, but the Java API provide a few interesting features like selectors)
If you can answer why NOT to use XML for configuration files, then you probably can answer why as well.