views:

402

answers:

3

Google's new language "Go" says on its website:

the language has been designed to be easy to analyze and can be parsed without a symbol table

I'm certainly no expert on these matters, but I thought a symbol table was a basic construct common to all compilers for languages that use variables, and Go clearly uses variables. What am I not understanding?

+8  A: 

Interpretation and compilation absolutely require symbol tables or similar. This is true for nearly all languages.

In C and C++, even parsing the language requires a symbol table.

Justice
That's what I thought. Then how can their claim be true? I assume it must be but I don't understand how.
Dinah
@Dinah: Read the statement again slowly, then read this answer again. They're not contradictory.
Michael Myers
@Dinah: There is more to compilation than parsing. Parsing builds up a tree representation of the structure of the program. Semantic analysis and code generation then operate on that tree to produce the compiled output.
quark
+11  A: 

Parsing means just figuring out the program structure: separating the module into statements/declarations, breaking expressions down to sub-expressions, etc. You end up with a tree structure, known as a "parse tree", or "abstract syntax tree" (AST).

Apparently, C++ requires a symbol table to do parsing.

This page discusses some reasons why C++ requires a symbol table for parsing.

Of course, parsing is only a part of compilation, and you will need a symbol table to do a full compilation.

However, parsing itself can be useful in writing analysis tools (e.g. which module imports which modules). So, simplifying the parsing process means it's easier to write code analysis tools.

hasen j
That's an excellent reference.
Mark Bessey
Honestly, it was just the first (or second) result from Google
hasen j
Note also the replies to that post - they too contain very good reasons why it's impossible.
MSalters
+3  A: 

@Justice is right. To expand on that a little, in C the only actual tricky part is telling types apart from variables. Specifically when you see this:

T t;

You need to know that T is a type for that to be a legal parse. That's something you have to look up in a symbol table. This is relatively simple to figure out as long as types are added to the symbol table as the parse continues. You don't need to do much extra work in the compiler: either T is present in the table or it isn't.

In C++ things are much, much more complicated. There are enormous numbers of ambiguous or potentially ambiguous constructs. The most obvious is this one:

B::C (c);

Aside from the fact that it's not clear if B is a class, a typedef, or a namespace, it's also not clear if C is a type and c an object of that type, or if C is a function (or constructor) taking c as an argument (or even if C is an object with operator() overloaded). You need the symbol table to carry on parsing, although it is still possible to continue quickly enough, as the type of the symbol is in the symbol table.

Things get much, much, much worse than that when templates come into the mix. If C (c) is in a template, you might not know in the actual definition of the template, if C is a type or a function/object. That's because the template can declare C to be either a type or a variable. What this means is that you need the symbol table, but you don't have one -- and you can't have one until the template is actually declared. Even worse, it's not necessarily sufficient to have just the type of the symbol: you can come up with situations which require the full information of the type the symbol represents, including size, alignment, and other machine-specific information.

All this has several practical effects. The two most significant I would say are:

  • Compilation is much faster. I assume Go is faster to compile than C, and C++ has famously slow compilation times for situations involving a lot of templates.
  • You can write parsers that don't depend on having a full compiler. This is very useful for doing code analysis and for refactoring.
quark