views:

136

answers:

3

Could someone please give me some advice/ideas about how to deal with the situations when it's needed to have a look at further declarations to be able to make correct semantic actions on current moment? For example, it is a well-known occurrence when someone writes an interpreter/compiler of some programming language which doesn't support "forward declarations". Let's have an example:

foo(123);//<-- our parser targets here. we estimate we have a function 
         //    invocation, but we have no idea about foo declaration/prototype,
         //     so we can't be sure that "foo" takes one integer argument.   


void foo(int i){
//...
}

It is pretty clear we have to have at least two passes. Firstly we parse all function declarations and get all the needed information such as: the amount arguments the function takes, their types and then we are able to deal with function invocations and resolving the difficulties as above. If we go this way we will must do all these passes using some AST traversing mechanisms/visitors. In this case we have to deal with AST traversing/applying visitors and we must say "good bye" to the all the beauty of phoenix constructions integrated directly in our parsers.

How would you deal with this?

+1  A: 

I think you're making unfounded assumptions. For instance, "it is pretty clear we have to have at least two passes". No it isn't. If the syntax is such that foo(123) can only be parsed as function-name "(" expression ")", then one pass is enough.

Therefore I would advise to design your syntax for unambiguous parsing. Avoid constructs that cannot be parsed in isolation, , e.g. avoid dependendencies on declarations elesewhere.

MSalters
thanks for response. yes, i do avoid ambiguous syntax. you're right, i have function-name "(" expression ")" like syntax for function invocations. my problem, as i described above, is a **semantic one**: does function "foo" gets an int argument, or string, bool, etc etc? on the moment we have just parsed "foo(123)" we cannot be sure it is semantically correct function invocation. therefore we are in need to parse ahead to perform correct semantic checks on that invocation, so there are at least two passes i've been talking about. hope now my question is clear.
varnie
in short, i'm talking about semantic issue not the parsing one.
varnie
+2  A: 

[2nd answer, on semantics] This particular example happens to be simple. What you can do is record function calls made to yet undeclared functions, and the actual argument types. When you do encounter a function declaration later, you check if there are preceding function calls that are (better) matched to this new function. You will obviously detect errors only at the end of the parse, becuase the very last line could introduce a missing function. But after that line, any function call that hasn't been matched at all is an error.

Now, the problem is that this works for simple semantics. If you look at more complex languages - e.g. with C++-like function templates - it no longer becomes possible to do such lookups in a simple table. You would need specialized tabes that structurally match your language constructs. An AST just isn't the best structure for those, let alone the partial AST during parsing.

MSalters
good point.MSalters, don't you know of any boost-spirit ways to do these "multi-passes" staying at boost? there's not so much pleasure to traverse/code these passes manually. are there any ways to say to boost-spirit to perform two parsing passes: the 1st one would parse only function-declarations, skipping all the rest, and the 2nd one deal with parsing all the rest, doing the semantic checks in-place.
varnie
Actually, my point was to use a single parse pass of the tree to build tables of everything you encounter, then extract semantics from the tables. And the point of a decent syntax is to keep those tables simple.
MSalters
+2  A: 

If you want to do two passes, instead of semantic checking at the end of the first pass, you can have te functions called by your actions know which pass they are in. So if you had some actions

[functionCall(name, args)]
[functionDef(name, args, body)]

They would be defined something like this (not proper spirit syntax, but you get the point)

functionCall(string name, vector<string> args)
{
  if (!first_pass) {
    // check args for validity
    // whatever else you need to do
  }
} 

functionDef(string name, vector<string> args, ... body)
{
  if (first_pass)
    // Add function decleration to symbol table
  else
    // define function
}
KeithB