views:

680

answers:

5

Please explain what is name mangling, how it works, what problem it solves, and in which contexts and languages is used. Name mangling strategies (e.g. what name is chosen by the compiler and why) a plus.

+12  A: 

Simply put, name-mangling is a process by which compilers changes the names of identifiers in your source code in order to aid the linker in disambiguating between those identifiers.

Wikipedia has a wonderful article on this subject with several great examples.

Andrew Hare
+3  A: 

Name mangling is a means by which compilers modify the "compiled" name of an object, to make it different than what you specified in a consistent manner.

This allows a programming language the flexibility to provide the same name to multiple, compiled objects, and have a consistent way to lookup the appropriate object. For example, this allows multiple classes with the same name to exist in different namespaces (often by prepending the namespace into the class name, etc).

Operator and method overloading in many languages take this a step further - each method ends up with a "mangled" name in the compiled library in order to allow multiple methods on one type to exist with the same name.

Reed Copsey
+1  A: 

In python, name-mangling is a system by which class variables have different names inside and outside the class. The programmer "activates" it by putting two underscores at the start of the variable name.

For example, I can define a simple class with some members:

>>> class Foo(object):
...  def __init__(self):
...   self.x = 3
...   self._y = 4
...   self.__z = 5
...

In python practice, a variable name starting with an underscore is "internal" and not part of the class interface, and so programmers should not rely on it. However, it is still visible:

>>> f = Foo()
>>> f.x
3
>>> f._y
4

A variable name starting with two underscores is still public, but it is name-mangled and thus harder to access:

>>> f.__z  
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Foo' object has no attribute '__z'

If we know how the name-mangling works, however, we can get at it:

>>> f._Foo__z
5

i.e. the classname is prepended to the variable name with an extra underscore.

Python has no concept of 'private' versus 'public' members; everything is public. Name-mangling is the strongest-possible signal a programmer can send that the variable should not be accessed from outside the class.

John Fouhy
A: 

In Fortran, name mangling is needed because the language is case insensitive, meaning that Foo, FOO, fOo, foo etc.. will all resolve to the same symbol, whose name must be normalized in some way. Different compilers implement mangling differently, and this a source of great trouble when interfacing with C or binary objects compiled with a different compiler. GNU g77/g95, for example, always adds a trailing underscore to the lowercased name, unless the name already contains one or more underscores. In this case, two underscores are added.

For example, the following routine

    program test
    end program 

    subroutine foo()
    end subroutine

    subroutine b_ar()
    end subroutine
    subroutine b_a_r()
    end subroutine

Produces the following mangled symbols:

0000000000400806 g     F .text  0000000000000006              b_ar__
0000000000400800 g     F .text  0000000000000006              foo_
000000000040080c g     F .text  0000000000000006              b_a_r__

In order to call Fortran code from C, the properly mangled routine name must be invoked (obviously keeping into account possible different mangling strategies to be truly compiler independent). To call C code from fortran, a C-written interface must export properly mangled names and forward the call to the C routine. This interface can then be called from Fortran.

Stefano Borini
+5  A: 

In the programming language of your choice, if an identifier is exported from a separately compiled unit, it needs a name by which it is known at link time. Name mangling solves the problem of overloaded identifiers in programming languages. (An identifier is "overloaded" if the same name is used in more than one context or with more than one meaning.)

Some examples:

  • In C++, function or method get may be overloaded at multiple types.

  • In Ada or Modula-3, function get may appear in multiple modules.

Multiple types and multiple modules cover the usual contexts.

Typical strategies:

  • Map each type to a string and use the combined high-level identifier and "type string" as the link-time name. Common in C++ (especially easy since overloading is permitted only for functions/methods and only on argument types) and Ada (where you can overload result types as well).

  • If an identifier is used in more than one module or namespace, join the name of the module with the name of the identifier, e.g., List_get instead of List.get.

Depending on what characters are legal in link-time names, you may have to do additional mangling; for example, it may be necessary to use the underscore as an 'escape' character, so you can distinguish

  • List_my.get -> List__my_get

from

  • List.my_get -> List_my__get

(Admittedly this example is reaching, but as a compiler writer, I have to guarantee that distinct identifiers in the source code map to distinct link-time names. That's the whole reason and purpose for name mangling.)

Norman Ramsey
+1 This is one of the best answers I have ever seen on this site.
Andrew Hare