views:

439

answers:

9

I'm starting a new multi-language project completely without cruft. I have to choose a format for config files. These will be:

  • Written by humans (developers, not end users)
  • Read by machines
  • Maintained by humans
  • Worked with by text editors (not a tool)

I have many different uses, like:

  • List of modules to run
  • Module execution order
  • Parameters for algorithms
  • General configuration (job name, project version, etc)

Some candidates I've thought of:

  • XML
    • Hard to get right in a text editor
  • INI
    • Can only do section-key-value
  • Makefile
    • yikes!
  • JSON
    • Have to escape " but syntax is widely known
  • YAML
    • People have to learn the syntax and error messages aren't really good but its nice to have references

What would you use and why? Is there a panacea or are some better suited than others?

+5  A: 

I'd suggest YAML. YAML is human friendly and readable.

If keys and values are all that you need, I'd go with INI or similar.

The MYYN
YAML is the way to go. simple, easy to parse, anyone can figure it out, and covers 99% of the needs for configuration files.
Tyson
Is there good schema tools for YAML?
Paul Tarjan
YAML itself does not have XML's language-defined document schema descriptors that allow, for example, a document to self validate. However, there are two externally defined schema descriptor languages for YAML Kwalify and Rx. / http://www.kuwata-lab.com/kwalify/ / http://rjbs.manxome.org/rx/
The MYYN
+4  A: 

I would tend towards XML. Why ?

  1. it's widely used and understood. Most technical people can understand what's going on
  2. it's hierarchical. For all but the simplest cases you'll need to be able to specify some sort of nesting
  3. there's a wide site of tools available to parse and manipulate XML (command line, APIs etc.)
  4. it supports character encodings (important for I18N)
  5. Can be validated against a schema

XML has it's shortcomings (e.g. angle-bracket tax) but given all the above I would go for XML for all but the simplest scenarios.

Brian Agnew
5. Can be validated against a schema.
devstuff
@devstuff - good point
Brian Agnew
Just make sure you use attributes wherever you can - the tax will be much smaller this way and the syntax will be concise.
gooli
+1  A: 

I'd say that answer depends on who is going to be using it.

JSON or YAML are probably a little nice to look at but as you have pointed out the parsing is a bit harder (but hey you're going to use a library anyway right?).

XML has the advantage that pretty much every developer is going to know how to use it without much fuss.

I'd probably tend towards YAML or JSON though because you don't go cross eyed looking at it.

Jason Tholstrup
+2  A: 

For very many programs the section/key layout of an INI file is all that is needed. If you can get away with this, I suggest you do. People seem to like INI files (and it worked for Samba and PHP), but often violently dislike very structured formats such as XML - can't think why!

anon
XML is a language for parser first, and humans second. It's a bad choice for configuration files which, like source code, should be human-readable first, and computer-readable second.
ddaa
+1  A: 

I understand the desirability of a single configuration file, but are you unnecessarily restricting yourself? Is it necessary to keep all configuration in one file? Would one format suit one subset of configuration more than another? You listed several different "categories" of configuration data, and it seems likely that they're not all similar in content or layout.

Hmm, I just answer your question with three more but I think they're all things I would take into consideration when making the choice, myself.

JMD
Sure, I'm happy to break it into many config files. Should I mix and match formats? Which should I use when?
Paul Tarjan
I think mixing formats is a terrible idea. It adds complexity to your code and confusion to your customers.
Chris Lutz
@Chris, I suggested considering it. It bears consideration, not premature dismissal. The choice, whether one or more than one, depends on context: which pieces of configuration are user-facing, which ones are static (i.e. developer only), runtime versus initialization only, how often they change, ...Ultimately, you might be right. Maybe more than one is not warranted, but at least the examination is worthwhile even if the end result is that he chooses a single format.
JMD
Point taken. I amend my statement: I think mixing formats of _user-facing_ configuration files is a terrible idea. For developer-only data, if you're willing to add the overhead of parsers to your program, knock yourself out.
Chris Lutz
+2  A: 

I'm a fan of using a simple programming language rather than a config format. Typically I'd suggest Lua.

I think variables and expressions is a fantastic way of removing redundancy and increasing the expressiveness in a configuration system (check out the examples in this post)

I understand there may be compatibility issues if it's a multi language project, but I think it should be considered at least.

Laserallan
I used to be a fan of this approach, but "Computer Science in the 1960s to 80s spent a lot of effort making languages which were as powerful as possible. Nowadays we have to appreciate the reasons for picking not the most powerful solution but the least powerful." (Tim Berners-Lee.)
Jason Orendorff
That is, plain old data has advantages over code. It's more human-readable. It's more machine-readable. You can look at it and see the actual values (rather than an algorithm for computing them). It encourages everyone to keep it simple. Loading an INI file can't put you in an infinite loop. It can't delete all your files. If loading an INI file works the first time, it'll work the next 3000 times too. (Scripts can work now and break later for any number of reasons.) Plain old data formats tend to have fewer versioning issues. And ironically, *data is better for scripting than code*.
Jason Orendorff
Writing a perl script to hack on an INI file is trivial. Writing a perl script to hack on a Lua config file is not really possible in the general case. Even if you do know Lua, you can't generally write a program to load a Lua config file, examine the values, change a value, and write it back. It's not possible. INI files can be indexed and searched. But if the config file is a program, the key may be generated in an arbitrarily complicated way, so you may not find it. And on and on.
Jason Orendorff
These are all very good points. With great power comes great responsibility.
Laserallan
+1  A: 

I use and recommend Lua. It's easy to learn, easy to embed, is lightweight and has a liberal license. It was originally designed with rich, flexible configuration files in mind.

Mike Fitzpatrick
+1  A: 

In your scenario I'd go for an ini file.

Other formats such as XML provide schema validation and the like. However you still need to validate (and possibly convert) any values read into your program regardless of the config file format used.

Most importantly the ini format is completely self explanatory and increases the signal to noise ratio to the point there is almost no noise, almost the opposite of XML.

  • A list of modules to run
  • Module execution order, as above.

These 2 uses can easily be represented in a non-hierarchical ini format. Ordering can be taken from physical ordering in ini file (just add a comment to state this).

  • Parameters for algorithms
  • General configuration (job name, project version, etc)

These 2 uses can be represented as key value pairs. It's what ini files do best.

You also mention that they will be created by Developers, but do not say who will be maintaining them. If maintenance is ever done by non-developers, ini files are so simple they reduce the likelihood of introducing errors.

Ash
+1  A: 

I'd suggest Lua, even if it requires some time to set up. (Their API is really well done, but an API for a scripting language is always at least a bit complex)

Andreas Bonini