views:

144

answers:

4

In my language I can use a class variable in my method when the definition appears below the method. It can also call methods below my method and etc. There are no 'headers'. Take this C# example.

class A
{
    public void callMethods() { print(); B b; b.notYetSeen();
    public void print() { Console.Write("v = {0}", v); }
    int v=9;
}

class B
{
    public void notYetSeen() { Console.Write("notYetSeen()\n"); }
}

How should I compile that? what i was thinking is:

  • pass1: convert everything to an AST
  • pass2: go through all classes and build a list of define classes/variable/etc
  • pass3: go through code and check if there's any errors such as undefined variable, wrong use etc and create my output

But it seems like for this to work I have to do pass 1 and 2 for ALL files before doing pass3. Also it feels like a lot of work to do until I find a syntax error (other than the obvious that can be done at parse time such as forgetting to close a brace or writing 0xLETTERS instead of a hex value). My gut says there is some other way.

Note: I am using bison/flex to generate my compiler.

+1  A: 

I would go through on pass one and collect all of your class/method/field names and types, ignoring the method bodies. Then in pass two check the method bodies only.

Tinister
+3  A: 

My understanding of languages that handle forward references is that they typically just use the first pass to build a list of valid names. Something along the lines of just putting an entry in a table (without filling out the definition) so you have something to point to later when you do your real pass to generate the definitions.

If you try to actually build full definitions as you go, you would end up having to rescan repatedly, each time saving any references to undefined things until the next pass. Even that would fail if there are circular references.

T.E.D.
A: 

I don't know that there can be any other way than traversing all the files in the source.

I think that you can get it down to two passes - on the first pass, build the AST and whenever you find a variable name, add it to a list that contains that blocks' symbols (it would probably be useful to add that list to the corresponding scope in the tree). Step two is to linearly traverse the tree and make sure that each symbol used references a symbol in that scope or a scope above it.

My description is oversimplified but the basic answer is -- lookahead requires at least two passes.

Kai
A: 

The usual approach is to save B as "unknown". It's probably some kind of type (because of the place where you encountered it). So you can just reserve the memory (a pointer) for it even though you have no idea what it really is.

For the method call, you can't do much. In a dynamic language, you'd just save the name of the method somewhere and check whether it exists at runtime. In a static language, you can save it in under "unknown methods" somewhere in your compiler along with the unknown type B. Since method calls eventually translate to a memory address, you can again reserve the memory.

Then, when you encounter B and the method, you can clear up your unknowns. Since you know a bit about them, you can say whether they behave like they should or if the first usage is now a syntax error.

So you don't have to read all files twice but it surely makes things more simple.

Alternatively, you can generate these header files as you encounter the sources and save them somewhere where you can find them again. This way, you can speed up the compilation (since you won't have to consider unchanged files in the next compilation run).

Lastly, if you write a new language, you shouldn't use bison and flex anymore. There are much better tools by now. ANTLR, for example, can produce a parser that can recover after an error, so you can still parse the whole file. Or check this Wikipedia article for more options.

Aaron Digulla
such as? after reading a book about bision it seems VERY interesting and not a bad tool at all.
acidzombie24
Added a couple of links to my answer.
Aaron Digulla
Bison parsers can recover after errors, too, using the special `error` nonterminal.
Edmund
@Edmund: ANTLR parsers do this automatically, the tool gives much better error messages, it handles tokens and grammar in a single file and the syntax of the grammar is a bit better. Lastly, it can produce grammars in many languages while Bison is limited to C.
Aaron Digulla