views:

246

answers:

3

I am currently working on a compiler under C and I am abit lost at the part where we construct the data structure for AST, especially for the part where we construct stucture for IDs, it is called "Symbol Table Entry"

I see structures over net such as:

struct ste {
  struct id   *name;  /* pointer into hash table for assoc. id */
  struct decl *decl;  /* pointer into symbol table for its decl */
  struct ste  *prev;  /* pointer to previous entry in symbol table */
};

It looks like a linked list as it holds a pointer to the previous entry (*prev) but what is the logic behind this ?

+8  A: 

The answer to your concrete question is this: the prev link means that, when your code has a pointer to one of these node, it can follow the link to the previous link in the chain. One reason a symbol table might have a list like this is to deal with nested scope:

{
int x;
  {
   int x;
  }
}

However, there are many, many other reasons why the symbols nodes might want to be arranged in a list. Any reason why the compiler needs to visit all the nodes is a reason.

bmargulies
+1  A: 

My first thought for the use of a reverse direction linked list would be for those languages that support overriding of variable names, such as:

int main (void) {
    int x = 1;
    int y = 1;
    if (x == 1) {
        int y = 2;
        printf ("y = %d\n", y);
    }
    return 0;
}

In that case, you want access to the variable with the innermost scope (the last one defined). This can be found by walking backwards through the list (assuming that you're building the list by going forward of course).

Then, when a scope disappears, you can also just adjust the 'head' pointer to remove the most recently added variables.

Of course, you could achieve the same effect by inserting before the current head rather than adding to the end of the list (which looks like conceptually what is being done, just with the pointer called prev instead of next).

paxdiablo
+2  A: 

You're seeing the leftovers of a pernicious habit from C programmers long ago: it is assumed that symbols will be on some lists, and instead of allocating list structures separately, the list pointers are included as part of the symbol structure. This trick saves one allocation per list element, but at a cost: the set of lists that a symbol can be on is fixed, and this structure confuses programmers. If the application is compilers, there is no reason to use this trick any more. It is much clearer to have a separate list structure that is defined something like this:

struct ste_list {
    struct ste *symbol_table_entry;
    struct str_list *next;
};

You can have as many of these as you like, and nobody is the wiser. And the internal pointers you find confusing go away.

You ask

what is the logic behind this?

Part of the answer is simply that it's useful to have symbols on a distinguished list. I can't answer the question definitively without knowing more about the particular compiler. My best guess is that the prev entry is going to be used to implement nested scopes (the { ... } brackets in C), but that's a guess based on compilers I've seen or worked on. So perhaps the logic is that when a closing brace is encountered, the compiler might follow that link until it gets to an ste in an enclosing scope. People with a bit more experience than the author of the compiler you're studying will generally put this logic in a "symbol-table abstraction" which will include functions like enterscope() and exitscope(), and the details of these operations will be hidden from the internal representation of individual symbol-table entries.

Norman Ramsey