tags:

views:

389

answers:

4

When I use gdb to debug a program written in C, the command disassemble shows the codes and their addresses in the code memory segmentation. Is it possible to know those memory addresses at runtime? I am using Ubuntu OS. Thank you.

[edit] To be more specific, I will demonstrate it with following example.

#include <stdio.h>

int main(int argc,char *argv[]){
    myfunction();
    exit(0);
}

Now I would like to have the address of myfunction() in the code memory segmentation when I run my program.

+5  A: 

To get a backtrace, use execinfo.h as documented in the GNU libc manual.

For example:

#include <execinfo.h>
#include <stdio.h>
#include <unistd.h>


void trace_pom()
{   
    const int sz = 15;
    void *buf[sz];

    // get at most sz entries
    int n = backtrace(buf, sz);

    // output them right to stderr
    backtrace_symbols_fd(buf, n, fileno(stderr));

    // but if you want to output the strings yourself
    // you may use char ** backtrace_symbols (void *const *buffer, int size)
    write(fileno(stderr), "\n", 1);
}


void TransferFunds(int n);

void DepositMoney(int n)
{   
    if (n <= 0)
        trace_pom();
    else TransferFunds(n-1);
}


void TransferFunds(int n)
{   
    DepositMoney(n);
}


int main()
{   
    DepositMoney(3);

    return 0;
}

compiled

gcc a.c -o a -g -Wall -Werror -rdynamic

According to the mentioned website:

Currently, the function name and offset only be obtained on systems that use the ELF binary format for programs and libraries. On other systems, only the hexadecimal return address will be present. Also, you may need to pass additional flags to the linker to make the function names available to the program. (For example, on systems using GNU ld, you must pass (-rdynamic.)

Output

./a(trace_pom+0xc9)[0x80487fd]
./a(DepositMoney+0x11)[0x8048862]
./a(TransferFunds+0x11)[0x8048885]
./a(DepositMoney+0x21)[0x8048872]
./a(TransferFunds+0x11)[0x8048885]
./a(DepositMoney+0x21)[0x8048872]
./a(TransferFunds+0x11)[0x8048885]
./a(DepositMoney+0x21)[0x8048872]
./a(main+0x1d)[0x80488a4]
/lib/i686/cmov/libc.so.6(__libc_start_main+0xe5)[0xb7e16775]
./a[0x80486a1]
Adrian Panasiuk
Is there a way that I can extract the function DepositMoney's address alone, meanings 0x8048872 should be the only output instead of printing out the whole backtrace?
Ah, sorry, I thought you wanted a full backtrace.
Adrian Panasiuk
+11  A: 

Above answer is vastly overcomplicated. If the function reference is static, as it is above, the address is simply the value of the symbol name in pointer context:

void* myfunction_address = myfunction;

If you are grabbing the function dynamically out of a shared library, then the value returned from dlsym() (POSIX) or GetProcAddress() (windows) is likewise the address of the function.

Note that the above code is likely to generate a warning with some compilers, as ISO C technically forbids assignment between code and data pointers (some architectures put them in physically distinct address spaces).

And some pedants will point out that the address returned isn't really guaranteed to be the memory address of the function, it's just a unique value that can be compared for equality with other function pointers and acts, when called, to transfer control to the function whose pointer it holds. Obviously all known compilers implement this with a branch target address.

And finally, note that the "address" of a function is a little ambiguous. If the function was loaded dynamically or is an extern reference to an exported symbol, what you really get is generally a pointer to some fixup code in the "PLT" (a Unix/ELF term, though the PE/COFF mechanism on windows is similar) that then jumps to the function.

Andy Ross
Thanks so much for your fantastic answer. Can I ask this question in advance: How about getting the address of a specific line of code?
No luck there. Compilers are free to reoder and optimize code, so there isn't a single memory region that corresponds to any given line or expression.Debuggers can do a pretty reasonable job of reconstructing this from the symbol table and debug information in the executable, but unfortunately there you're getting into some deep voodoo that I don't know off-hand.
Andy Ross
Nice answer, well explained. You can also work around the fact that conversion between code and data pointers isn't defined, by applying this trick (as done in "man dlsym" in the reverse direction): void *p; *(void(**)()) or by using an union.
Johannes Schaub - litb
+3  A: 

If you know the function name before program runs, simply use

void * addr = myfunction;

If the function name is given at run-time, I once wrote a function to find out the symbol address dynamically using bfd library. Here is the x86_64 code, you can get the address via find_symbol("a.out", "myfunction") in the example.

#include <bfd.h>
#include <stdio.h>
#include <stdlib.h>
#include <type.h>
#include <string.h>

long find_symbol(char *filename, char *symname)
{
    bfd *ibfd;
    asymbol **symtab;
    long nsize, nsyms, i;
    symbol_info syminfo;
    char **matching;

    bfd_init();
    ibfd = bfd_openr(filename, NULL);

    if (ibfd == NULL) {
        printf("bfd_openr error\n");
    }

    if (!bfd_check_format_matches(ibfd, bfd_object, &matching)) {
        printf("format_matches\n");
    }

    nsize = bfd_get_symtab_upper_bound (ibfd);
    symtab = malloc(nsize);
    nsyms = bfd_canonicalize_symtab(ibfd, symtab);

    for (i = 0; i < nsyms; i++) {
        if (strcmp(symtab[i]->name, symname) == 0) {
            bfd_symbol_info(symtab[i], &syminfo);
            return (long) syminfo.value;
        }
    }

    bfd_close(ibfd);
    printf("cannot find symbol\n");
}
ZelluX
+2  A: 

About a comment in an answer (getting the address of an instruction), you can use this very ugly trick

#include <setjmp.h> 

void function() {
    printf("in function\n");
    printf("%d\n",__LINE__);
    printf("exiting function\n");

}

int main() {
    jmp_buf env;
    int i;

    printf("in main\n");
    printf("%d\n",__LINE__);
    printf("calling function\n");
    setjmp(env);
    for (i=0; i < 18; ++i) {
        printf("%p\n",env[i]);
    }    
    function();
    printf("in main again\n");
    printf("%d\n",__LINE__);

}

It should be env[12] (the eip), but be careful as it looks machine dependent, so triple check my word. This is the output

in main
13
calling function
0xbfff037f
0x0
0x1f80
0x1dcb
0x4
0x8fe2f50c
0x0
0x0
0xbffff2a8
0xbffff240
0x1f
0x292
0x1e09
0x17
0x8fe0001f
0x1f
0x0
0x37
in function
4
exiting function
in main again
37

have fun!

Stefano Borini