tags:

views:

72

answers:

1

Hi,

Pardon me if this question is too trivial! It is known that the final executable does not allocate space for the uninitialized data within the image. But I want to know, how are the references to the symbols with in .bss get resolved?

Does the object file contain only the addresses of these variables of .bss some where else and NOT allocate space for them? If so, where are these resolved addresses are stored?

For eg. if in a C module I have something like following global variables -

int x[10]; char chArray[100];

The space for above variables may not be present in the image, but how one will reference them? Where are their addresses resolved?

Thanks in advance! /MS

+2  A: 

.bss symbols get resolved just like any other symbol the compiler (or assembler) generates. Usually this works by placing related symbols in "sections". For example, a compiler might place program code in a section called ".text" (for historical reasons ;-), initialized data in a section called, ".data", and unitiatialzed data in a section called .".bss".

For example:

int i = 4;
int x[10];
char chArray[100];

int main(int argc, char**argv)
{
}

produces (with gcc -S):

    .file   "test.c"
.globl i
    .data
    .align 4
    .type   i, @object
    .size   i, 4
i:
    .long   4
    .text
.globl main
    .type   main, @function
main:
    leal    4(%esp), %ecx
    andl    $-16, %esp
    pushl   -4(%ecx)
    pushl   %ebp
    movl    %esp, %ebp
    pushl   %ecx
    subl    $4, %esp
    addl    $4, %esp
    popl    %ecx
    popl    %ebp
    leal    -4(%ecx), %esp
    ret
    .size   main, .-main
    .comm   x,40,32
    .comm   chArray,100,32
    .ident  "GCC: (GNU) 4.3.2 20081105 (Red Hat 4.3.2-7)"
    .section        .note.GNU-stack,"",@progbits

The .data directive tells the assembler to place i in the data section, the ".long 4" gives it its initial value. When the file is assembled, i will be defined at offset 0 in the data section.

The .text directive will place main in the .text section, again with an offset of zero.

The interesting thing about this example is that x and chArray are defined using the .comm directive, not placed in .bss directly. Both are given only a size, not an offset (yet).

When the linker gets the object files, it links them together by combining all the sections with the same name and adjusting the symbol offsets accordingly. It also gives each section an absolute address at which it should be loaded.

The symbols defined by the .comm directive are combined (if multiple definitions with the same name exist) and placed in the .bss section. It's at this point that they are given their address.

Richard Pennington
+1 Couldn't explain it better!Although, of course, all of this is architecture dependant, and whilst I don't know of an example, it would be possible for all .bss addresses to be composed relative to a hardware register - so the actual address is only resolved at runtime.
Autopulated
@Autopulated: Great point. I actually implemented a C compiler that did exactly that: It used the X index register to point to the .bss area for position independent data.
Richard Pennington
Thanks :-). So if am getting it, the .bss gets its symbols resolved once all the .comm directive are combined AND these resolved addresses are patched up wherever they were referenced (kind of .bss + offset of variable). And .bss base is assigned at run time. Right?
MS
@MS: It depends on the exact environment. There could be a register that points to the .bss section at run time, or the address of the bss section could be known at link time and the actual address filled in by the linker. Many object file formats keep track of where in the object file a specific symbol is accessed (often called a relocation record) and can modify the object code to fix the symbol address during linking.
Richard Pennington