views:

180

answers:

2

What are and how to use the "island grammar" in antlr3?

+1  A: 

Well I'm not sure exactly what you mean but since you haven't mentioned what you've written so far I'd start here:

http://www.antlr.org/wiki/display/ANTLR3/Island+Grammars+Under+Parser+Control

chollida
+1  A: 

An island grammar is one that treats most of a language as a blob of text ("water") and picks out the part of the langauge of interest to parse using grammar rules ("island"). For instance, you might choose to build an island grammar to pick out all the expressions found in a C# program, and ignore the variable/method/class declarations and the statement syntax (if, while, ...).

The real question is, "Should you use island grammars at all?".

The positive benefits:

  • you don't have to write a full, complete grammar for the language you want to process.

The downside:

  • It isn't always easy to pick out the part of the source of interest. For example, how do you ensure that the island grammar doesn't pick up a block of code that is commented out, unless your island grammar lexes all the comments in detail? The point of an island grammar was to avoid going into that kind of detail, and yet here you must.

  • You can only use the island grammar to focus on the problem as you understand it right now. If the problem moves, then your island grammar may have to shift, too, and that isn't always easy.

  • Most interesting problems in program manipulation require that you be able to determine not only the syntax (e.g., "parse") and build some kind of tree to manipulate, but that you also be able to determine the meaning of symbols. With an island grammar, you've effectively written off the possibility of doing that (unless you want to include all the syntax for blocks, declarations, etc. ... oops suddenly it isn't an island grammar but a small continent grammar). And that really limits what you can do.

Call me biased, but I've been doing this kind of stuff a long time. I believe that island grammars really aren't that useful. The alternative I propose is amortized-cost parsers over many languages based on common foundations, and its called DMS Software Reengineering Toolkit.

YMMV.

Ira Baxter