views:

69

answers:

3

I want to implement a minimal templating language like Template Toolkit but much more simple. I don't want to use an existing implementation/library, but start from scratch because I want to learn something from it and I want to completely understand it in order to adopt it to my needs. The end product should be in C but I will probably try to make a prototype in Perl first. For the beginning I only want it to handle including other files, substituting variables and, now comes the hard part, arbitrarily nestable if/elseif/else/endif-constructs which require some advanced parsing.

Here is an example illustrating its intended usage:

<h1>[% substitute title %]</h1>
<p>
[% if foo %]
foo is true
[% elseif bar %]
[% if baz %]
bar and baz are true
[% endif %]
bar is true
[% else %]
<em>none<em> is true
[% endif %]
</p>

I have decent C and some Perl skills but absolutely no knowledge in parsing, so I don't even know what exactly I am looking for. So I would be interested in

  • which algorithms can handle parsing like this
  • reading recommendations on such algorithms, minimal introductions to parsing relevant here, or tutorials
  • minimal, well documented/commented examples (I could not make much sense from TT source)

TIA.

A: 

I've written a general answer to a similar question some time before. Hopefuly, it can help you to find some starting point.

Etan
+1  A: 

If you are using C, try (f)lex and yacc/bison. They are not that hard to use.

Besides there are several questions on the basics of compilers on SO.

Just the basics:

The first step is to translate the character stream to a token stream.

For example [% and %] are two tokens. But an identifier is also a token.

The next step, is to detect and execute the grammar. You can do this by building a syntax tree:

              [if]
             /  | \
            /   |  \
            |  Exp  |
            |   |   |
            |  foo  |
            |       |
      "foo is.."    elsif
                   / | \
                  /   |  \
                  |  Exp  |
                  |   |   |
                  |  bar  |
                  |       |
                  if      "none is true"
                /  | \
               /   |  \
               |  Exp  |
               |   |   |
               |  baz  |
               |       |
      "bar and..."    empty

And execute the tree. Which implies: for each (else)if node, evaluate the expression, and execute the true branch if true and the fase branch if false.

Gamecat
Thanks for the example. Would that be bootom-up parsing (http://en.wikipedia.org/wiki/Bottom-up_parsing)? I'm trying to get familiar with the concepts and terms so I can at least google stuff.
Note that what is shown in Gamecat's post is generally referred to as an AST (abstract syntax tree). When parsing, the token stream is typically converted into a parse tree first, *then* an AST.
Noldorin
@JG: Bottom-up parsing is a generic term for a certain type of parsing. If you want to learn the theory, it's probably best to start with top-down parsing (more specifically, recursive-descent).
Noldorin
A: 

JavaCC is the Java Compiler Compiler, its for making compilers in java. Quite a useful bit of software if you want to make a programming language or interpreter.

Zoidberg