views:

181

answers:

4

Aside from getting any real work done, I have an itch. My itch is to write a view engine that closely mimics a template system from another language (Template Toolkit/Perl). This is one of those if I had time/do it to learn something new kind of projects.

I've spent time looking at CoCo/R and ANTLR, and honestly, it makes my brain hurt, but some of CoCo/R is sinking in. Unfortunately, most of the examples are about creating a compiler that reads source code, but none seem to cover how to create a processor for templates.

Yes, those are the same thing, but I can't wrap my head around how to define the language for templates where most of the source is the html, rather than actual code being parsed and run.

Are there any good beginner resources out there for this kind of thing? I've taken a ganer at Spark, which didn't appear to have the grammar in the repo.

Maybe that is overkill, and one could just test-replace template syntax with c# in the file and compile it. http://msdn.microsoft.com/en-us/magazine/cc136756.aspx#S2

If you were in my shoes and weren't a language creating expert, where would you start?

A: 

Step 1. Use regular expressions (regexp substitution) to split your input template string to a token list, for example, split

hel<b>lo[if foo]bar is [bar].[else]baz[end]world</b>!

to

write('hel<b>lo')
if('foo')
write('bar is')
substitute('bar')
write('.')
else()
write('baz')
end()
write('world</b>!')

Step 2. Convert your token list to a syntax tree:

* Sequence
** Write
*** ('hel<b>lo')
** If
*** ('foo')
*** Sequence
**** Write
***** ('bar is')
**** Substitute
***** ('bar')
**** Write
***** ('.')
*** Write
**** ('baz')
** Write
*** ('world</b>!')

class Instruction {
}
class Write : Instruction {
  string text;
}
class Substitute : Instruction {
  string varname;
}
class Sequence : Instruction {
  Instruction[] items;
}
class If : Instruction {
  string condition;
  Instruction then;
  Instruction else;
}

Step 3. Write a recursive function (called the interpreter), which can walk your tree and execute the instructions there.

Another, alternative approach (instead of steps 1--3) if your language supports eval() (such as Perl, Python, Ruby): use a regexp substitution to convert the template to an eval()-able string in the host language, and run eval() to instantiate the template.

pts
Well, the first 3 steps are essentially ANTLR/COCO/R right? It's a matter of choosing to roll your own parsers, or rely on a grammer.The alternative is something I can wrap my head around (translate the template to c# output. What's the right vs easy vs best approach to take when starting?In the end, tests are king for this kind of thing.
claco
Step 2 is like ANTLR and COCO/R. The best approach for me is make it as simple as possible, while it remains useful. So, for example if your `if` conditions don't have to support arithmetic expressions (such as [if 3*4>10]), then you don't need ANTLR or COCO/R: you can just scan the template from left to right, and put pending ifs to a stack, so when you see an [end], you know what to close.
pts
+2  A: 

The Spark grammar is implemented with a kind-of-fluent domain specific language.

It's declared in a few layers. The rules which recognize the html syntax are declared in MarkupGrammar.cs - those are based on grammar rules copied directly from the xml spec.

The markup rules refer to a limited subset of csharp syntax rules declared in CodeGrammar.cs - those are a subset because Spark only needs to recognize enough csharp to adjust single-quotes around strings to double-quotes, match curley braces, etc.

The individual rules themselves are of type ParseAction<TValue> delegate which accept a Position and return a ParseResult. The ParseResult is a simple class which contains the TValue data item parsed by the action and a new Position instance which has been advanced past the content which produced the TValue.

That isn't very useful on it's own until you introduce a small number of operators, as described in Parsing expression grammar, which can combine single parse actions to build very detailed and robust expressions about the shape of different syntax constructs.

The technique of using a delegate as a parse action came from a Luke H's blog post Monadic Parser Combinators using C# 3.0. I also wrote a post about Creating a Domain Specific Language for Parsing.

It's also entirely possible, if you like, to reference the Spark.dll assembly and inherit a class from the base CharGrammar to create an entirely new grammar for a particular syntax. It's probably the quickest way to start experimenting with this technique, and an example of that can be found in CharGrammarTester.cs.

loudej
A: 

There are sooo many thing to do. But it does work for on simple GET statement plus a test. That's a start.

http://github.com/claco/tt.net/

In the end, I already had too much time in ANTLR to give loudejs' method a go. I wanted to spend a little more time on the whole process rather than the parser/lexer. Maybe in version 2 I can have a go at the Spark way when my brain understands things a little more.

claco
A: 

Vici Parser (formerly known as LazyParser.NET) is an open-source tokenizer/template parser/expression parser which can help you get started.

If it's not what you're looking for, then you may get some ideas by looking at the source code.

Philippe Leybaert