views:

55

answers:

2

Hi,

In any programming environment,what ever the data type I am going to choose finally the CPU will do only the Arithmetic operations(addition/logical operations).

How this transition(from user defined data type/operations to CPU instruction set) happens and what is the role of compiler,interpreter,assembler and linker in this life cycle

Also how OOPS handles this mapping since the worst case mostly all are objects in OOPS(I mean the Java language)..

+3  A: 

Java source --> native code translation actually happens in two distinct steps: the conversion from source code to bytecode at compile time (that's what javac does), and the conversion from bytecode to native CPU instructions at runtime (that's what java does).

When the source code is being "compiled", the fields and methods get condensed into entries in a symbol table. You say "System.out.println()", and javac turns it into something like "get the static field referenced by symbol #2004, and invoke the method referred to by symbol #300 on it" (where #2004 might be "System.out" and #300 might be "void java.io.PrintStream.println()"). (Note, i'm way oversimplifying -- the symbols look nothing like that, and they're split up a bit more. But they do contain that kind of info.)

At runtime, the JVM looks at those symbols, loads the classes referred to in them, and runs (or generates, if it's JITting) the native instructions necessary to find and execute the method. There's no real "linker" in Java; all the linking is done at runtime, based on the classes referenced. It's a lot like how DLLs work in Windows.

JIT is about the closest thing there is to an "assembler". It takes the bytecode and generates equivalent native code on the fly. The bytecode isn't in human-readable form, though, so i wouldn't normally count the translation as "assembling".

...

In languages like C and C++ (not C++/CLI), the story is quite different. All of the translation (and a good bit of linking) happens at compile time. Access to members of a struct gets converted into something like "give me the int 4 bytes from the beginning of this particular bunch of bytes". There's no flexibility there; if the struct's layout changes, generally the whole app has to be recompiled.

cHao
+1  A: 

Consider the starting point of a language that has only integers and floats of various sizes, and a type that points into memory that lets us have pointers to those types.

The correlation from this to the machine code the CPU uses would be relatively clear (though in fact we might well optimise beyond that).

Characters we can add by storing code-points in some encoding, and strings we build as arrays of such characters.

Now lets say we want to move this to the point where we can have something like:

class User
{
  int _id;
  char* _username;
  public User(int id, char* username)
  {
    _id = id;
    _username = username;
  }
  public virtaul bool IsDefaultUser()
  {
    return _id == 0;
  }
}

The first thing we need to add to our language is some sort of struct/class construct that contains the members. Then we can have as far as:

class User
{
  int _id;
  char* _username;
}

Our compiling process knows that this means storing an integer followed by a pointer to an array of characters. It therefore knows that accessing _id means accessing the integer at the address of the start of the structure, and accessing _username means accessing the pointer to char at a given offset from the address of the start of the structure.

Given this, the constructor can exist as a function that does something like:

  _ctor_User*(int id, char* username)
  {
    User* toMake = ObtainMemoryForUser();
    toMake._id = id;
    toMake._username = ObtainMemoryAndCopyString(username);
    return toMake;
  }

Obtaining memory and cleaning it up when appropriate is complicated, take a look at the section in the K&R on how to use pointers to structures and how malloc looks for one way this could be done.

From this point we can also implement IsDefaultUser with something like:

bool _impl_IsDefaultUser(*User this)
{
  return this._id == 0
}

This can't be overridden though. To allow for overriding we change User to be:

class User
{
  UserVTable* _vTable;
  int _id;
  char* _username;
}

Then _vTable points at a table of pointers to functions, which in this case contains a single entry, which is a pointer to the function above. Then calling the virtual member becomes a matter of looking at the correct offset into that table, and calling the appropriate function found. A derived class would have a different _vTable that would be the same except for having different function pointers for those methods that are overridden.

This is glossing over an awful lot, and not the only possibility in each case (e.g. v-tables are not the only way to implement overridable methods), but does show how we can build an object-oriented language which can be compiled down to more primitive operations on more primitive data types.

It also glosses over the possibility of doing something like the way C# is compiled to IL which is then in turn compiled to machine code, so that there are two steps between the OO language and the machine code that will actually be excuted.

Jon Hanna