tags:

views:

115

answers:

1

Hi,

I was trying to write a small debug utility and for this I need to get the function/global variable address given its name. This is built-in debug utility, which means that the debug utility will run from within the code to be debugged or in plain words I cannot parse the executable file.

Now is there a well-known way to do that ? The plan I have is to make the .debug_* sections to to be loaded into to memory [which I plan to do by a cheap trick like this in ld script]

.data { *(.data) __sym_start = .; (debug_); __sym_end = .; }

Now I have to parse the section to get the information I need, but I am not sure this is doable or is there issues with this - this is all just theory. But it also seems like too much of work :-) is there a simple way. Or if someone can tell upfront why my scheme will not work, it ill also be helpful.

Thanks in Advance, Alex.

A: 

If you are running under a system with dlopen(3) and dlsym(3) (like Linux) you should be able to:

char thing_string[] = "thing_you_want_to_look_up";
void * handle = dlopen(NULL, RTLD_LAZY | RTLD_NOLOAD);
  // you could do RTLD_NOW as well.  shouldn't matter
if (!handle) {
   fprintf(stderr, "Dynamic linking on main module : %s\n", dlerror() );
   exit(1);
}

void * addr = dlsym(handle, thing_string);
fprintf(stderr, "%s is at %p\n", thing_string, addr);

I don't know the best way to do this for other systems, and this probably won't work for static variables and functions. C++ symbol names will be mangled, if you are interested in working with them.

To expand this to work for shared libraries you could probably get the names of the currently loaded libraries from /proc/self/maps and then pass the library file names into dlopen, though this could fail if the library has been renamed or deleted.

There are probably several other much better ways to go about this.

edit without using dlopen

/* name_addr.h */
struct name_addr {
     const char * sym_name;
     const void * sym_addr;
};
typedef struct name_addr name_addr_t;
void * sym_lookup(cost char * name);
extern const name_addr_t name_addr_table;
extern const unsigned name_addr_table_size;

/* name_addr_table.c */
#include "name_addr.h"

#define PREMEMBER( X ) extern const void * X
#define REMEMBER( X ) { .sym_name = #X , .sym_addr = (void *) X }

PREMEMBER(strcmp);
PREMEMBER(printf);
PREMEMBER(main);
PREMEMBER(memcmp);
PREMEMBER(bsearch);
PREMEMBER(sym_lookup);
/* ... */

const name_addr_t name_addr_table[] =
{
       /* You could do a #include here that included the list, which would allow you
        * to have an empty list by default without regenerating the entire file, as
        * long as your compiler only warns about missing include targets.
        */
     REMEMBER(strcmp),
     REMEMBER(printf),
     REMEMBER(main),
     REMEMBER(memcmp),
     REMEMBER(bsearch),
     REMEMBER(sym_lookup);
     /* ... */
};
const unsigned name_addr_table_size = sizeof(name_addr_table)/sizeof(name_addr_t);

/* name_addr_code.c */
#include "name_addr.h"
#include <string.h>

void * sym_lookup(cost char * name) {
    unsigned to_go = name_addr_table_size;
    const name_addr_t *na = name_addr_table;
    while(to_to) {
       if ( !strcmp(name, na->sym_name) ) {
            return na->sym_addr;
       }
       na++;
       to_do--;
    }
    /* set errno here if you are using errno */
    return NULL;  /* Or some other illegal value */
}

If you do it this way the linker will take care of filling in the addresses for you after everything has been laid out. If you include header files for all of the symbols that you are listing in your table then you will not get warnings when you compile the table file, but it will be much easier just to have them all be extern void * and let the compiler warn you about all of them (which it probably will, but not necessarily).

You will also probably want to sort your symbols by name such that you can use a binary search of the list rather than iterate through it.

You should note that if you have members in the table which are not otherwise referenced by the program (like if you had an entry for sqrt in the table, but didn't call it) the linker will then want (need) to link those functions into your image. This can make it blow up.

Also, if you were taking advantage of global optimizations having this table will likely make those less effective since the compiler will think that all of the functions listed could be accessed via pointer from this list and that it cannot see all of the call points.

Putting static functions in this list is not straight forward. You could do this by changing the table to dynamic and doing it at run time from a function in each module, or possibly by generating a new section in your object file that the table lives in. If you are using gcc:

#define SECTION_REMEMBER(X) \
   static const name_addr_t _name_addr##X = \
     {.sym_name= #X , .sym_addr = (void *) X } \
     __attribute__(section("sym_lookup_table" ) )

And tack a list of these onto the end of each .c file with all of the symbols that you want to remember from that file. This will require linker work so that the linker will know what to do with these members, but then you can iterate over the list by looking at the begin and end of the section that it resides in (I don't know exactly how to do this, but I know it can be done and isn't TOO difficult). This will make having a sorted list more difficult, though. Also, I'm not entirely certain initializing the .sym_name to a string literal's address would not result in cramming the string into this section, but I don't think it would. If it did then this would break things.

You can still use objdump to get a list of the symbols that the object file (probably elf) contains, and then filter this for the symbols you are interested in, and then regenerate the table file the table's members listed.

nategoose
Hi nategoose, Thanks for the answer. But I am working on a very small ukernel which doesnot have the dlopen. I wonder how dlopen() works though? how does it find the address ?Thanks,Alex.
It looks it up in the linking symbol table. If you are using ELF object files you can use libelf and possibly custom code to locate where the sections of the file were loaded. But you're going to have to give more info about what system this is supposed to run on for anyone to give you a reasonable answer. An unreasonable answer would be to create a table with each function name and address with something like an awk or perl script run on your C code (output of the script would be a C file that had an array of `struct { char * name; void * addr; }` values that you would add to your program.
nategoose
Hi, I actually tried out my scheme of loading the debug_info + debug_abbrev + debug_pubnames in to memory in data segment and parsing it to get the information and I am able to get it working :-) I tried this actually on *linux* before trying it with the real system. The real system as you asked is a microkernel and the code I will be writing will be bundled with the microkernel (as its a embedded sort of application and cannot load any prog @ runtime). the kernel image is in elf format and hence I think I can try out this scheme. But think this is too much code for a simple thing :-(
I was thinking about the second approach you have suggested to use a script to parse the symbol file and make an struct array; which is simple enough, but the problem there is the code which reads the symbols are part of the code whose symbols we need :-) So that creates a sort of chicken and egg problem.
Not really. If you only require the ability to look up functions having the function that can do that lookup in the list isn't a problem since you are only creating a variable. If you want to be able to look up variables' addresses as well then you could just make the name of this list a special case in your script. You'll probably want the bare minimum generated by the script and have that in its own .c file.
nategoose
Many embedded systems only include the .text and .data from what the compiler and linker produced, not the symbol table or debug symbols. This is almost always the case when the _kernel_ is linked with the _application_ . You should be able to get the info from `objdump` and `nm` run on the elf file, if you don't HAVE to have it run on the embedded system. I think you are using the term "microkernel" improperly.
nategoose
its a microkernel on a embedded system. There is no issue with creating the structure with all the function names and addresses. My doubt was how to make the structure we made, part of the image. We can run the nm or objdump on the final linked image and create the array of struct. Once the image is done how can we add our structure to the .data section of the image ?
Using nm or objdump would work well if you didn't have to do this on the embedded system. Since that does not seem to be an option you just let the linker do the work for you. See _edit_.
nategoose