views:

168

answers:

3

Hi there,

I've developed an own file format for configuration files (plaintext and line based -> EOL = one configuration) for an application. This format is nothing quit special and the only reason I do this, is to learn something! The reader and writer functions will be implemented in C (with GLib because it should be a UTF8 encoded file).

So now, I'm thinking about the way I implement this format in C code. Which steps I have to do to get error messages that are as good as possible. I've heard something about Lexer, Parser, ... but never gone too deep in it. I’ve only a very abstract idea of them. So which steps I need to do to get a clean reader written in C for the format, which is also maintainable for future changes? What are the topics to learn/think about?

And yes I know: C is pain, there are a lot of diffrent "sexy" formats for this propose and so on. I want to learn something!

Cheers, Gregor


Additional information

  • The reader/writer/parser (or whatever it's called) should depend on as little as possible on third party programs/components. The application around this config part already uses GLib, so that's whay GLib is also used for UTF8
+1  A: 

You might want to look at the libconfig source code. It has a lightweight parser you could use as a starting point and that will probably help you in figuring out what a parser for your own format would have to look like.

Though, if you really want to learn about parsers and lexers, it would probably be better to implement a simple compiler. There's an MIT course you could follow.

drby
Sorry, I'm somehow overlooked your answer. Libconfig is nearly 1:1 my configuration file format :-). Great! This will made my coworkers happy and I've more time to explore all the links/ideas in the answers and libconfig's inwards.
Gregor
+1  A: 

Depending on how deep you'd like to dive into learning the matter, you should think about not writing your parser manually. You can do so of course, but it will be a great deal more complicated and adding new features to your language will burden you with the problems of always adapting lexer and parser code.

The good thing is, there are lots of tools out there that enable you to generate this stuff from a high-level description of your input and its structure. Standard *nix tools to do so are Lex and Yacc (or their descendants Flex and Bison), but I'd like to point you to ANTLR (http://www.antlr.org) instead. One of its nice features is that it provides backends for many different languages (C/C++ as well as Java, Python, Ruby, C#, ...), so learning how to work with it will also help you if you want to switch languages at a later point.

BjoernD
A very intresting thing this ANTLR. So I could easily provide tools for the configuration file in other languages... I've a look on it!
Gregor
+3  A: 

One cool way of creating a config format is to embed a scripting language.

This gives you the parser for free and gives you the possibility to generate data on the fly or define variables that are being reused:

Consider these examples of xml vs an ugly pseudo scripting language:

<InputPoints>
  <Point>
    <x>1.0</x>
    <y>1.0</y>
  </Point>
  <Point>
    <x>1.0</x>
    <y>2.0</y>
  </Point>
  <Point>
    <x>1.0</x>
    <y>3.0</y>
  </Point>
  <Point>
    <x>1.0</x>
    <y>4.0</y>
  </Point>
<InputPoint>

vs:

for(i = 1; i <= 4; ++i) {
  InputPoint(1, i);
}

or perhaps

<Username>allanballan</Username>
<Accountname>allanballan</Accountname>
<HomeDirectory>/home/allanballan</HomeDirectory>

vs

user = "allanballan";
Username = user;
Accountname = user;
HomeDirectory = "/home/"+user;

The first example compresses a list of points to a few statements, the second examples shows how to remove lots of redundant data using a temporary variable.

A popular language for this kind of situation is Lua. Exactly how to map a scripting language to configuration is up to the integrator, but it's really powerful and it comes with parsing and type checking for free.

Laserallan
(+1) It's a very good point about configuration files which depend on scripting languages. I used already a few of this kind and it's great. However, for my propose this is too much. It should depend on other programs as few as possible.
Gregor
tcl is also a good choice. API is really simple, and syntax is often suitable for configuration files.
roe