views:

72

answers:

3

Assuming the interpreter for the language (Can be anything from PHP to Ruby) is written in C. How are variables (or more complex data structures not only containg name and value), which are defined by the script that is currently being executed, stored and read out?

I, with my rather poor knowledge of C, would end up with the conclusion that this can only be done with an array.

// Variable type definition would go here
var* variables;

The var type would contain two strings name and value.

Okay. So a script defines e.g.: 30 variables. Now, if one of the variables has to be read out, the function getVar (or something similar) would have to walk through all 30 variables and compare their names with the name of the requested variable. Imagine that with a loop that requests

  • Am I getting it totally wrong? If yes, how do (modern?) scripting languages handle variables? How are they stored and read out?

  • In languages where variables are clearly defined by the syntax (PHP: $myVar), the interpreter could replace all variables by numerical values during the parsing process. (Am I right with that?) Is this the case?

+2  A: 

They almost certainly use a more sophisticated data structure.

 struct Var
 {
     char *name;
     int type;
     union value; ....
  };

and then store them in a tree (hash or binary) so they can be retrieved by name

pm100
A hash table (rather than a tree) is probable, since accessing the variables in sorted order is not likely to be needed.
caf
+3  A: 

Hashtables, scope linked lists, references ... there is alot to it.

What you are asking is half-abstract, and the implementation is variable.

Depending on the implementation:

  1. Identifiers may be compiled into memory addresses, or relative memory addresses, or nameless locations referenced by bytecode.
  2. Identifiers may be looked-up dynamically in the scope at runtime

In a basic scripting language, variable names when encoutered would be put into a scope structure such as a linked-list containing a hashtable implementation for looking up identifiers in that scope.

When a variable is referenced, the runtime code looks it up in the hashtable and obtains some value (a memory address of a struct for example) which relates to that value. Structs can be used to implement scalar variables:

enum stype = { INT, STRING, FLOAT, BOOL };
struct scalar {
    enum stype type;
    generic_blob_t *heap_blob;
};

Or some variation of my poor example source.

Some good books are "Modern Compiler Implementation in C" and the Dragon Book. Reading up on this topic is a good idea; I would recommend it to any programmer.

Aiden Bell
+2  A: 

Typically, scripting language implementations will use fairly complex C data structures to represent variables in the scripting language. For languages where C extensions are well-defined, the documentation is readily available:

[SO editors: feel free to add more references to the above list]

Greg Hewgill
+1 Cracking examples of C-to-type interaction for binding. I like Python's approach.
Aiden Bell