tags:

views:

48

answers:

2

Hey, recently (yay no more school) I've been teaching myself about the ELF file format. I've largely been following the documentation here: http://www.skyfree.org/linux/references/ELF_Format.pdf.

It was all going great, and I wrote this program to give me info about an ELF file's sections:

#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <elf.h>


void dumpShdrInfo(Elf32_Shdr elfShdr, const char *sectionName)
{
printf("Section '%s' starts at 0x%08X and ends at 0x%08X\n", 
    sectionName, elfShdr.sh_offset, elfShdr.sh_offset + elfShdr.sh_size);
}

int search(const char *name)
{
Elf32_Ehdr elfEhdr;
Elf32_Shdr *elfShdr;
FILE *targetFile;
char tempBuf[64];
int i, ret = -1;

targetFile = fopen(name, "r+b");

if(targetFile)
{
    /* read the ELF header */
    fread(&elfEhdr, sizeof(elfEhdr), 1, targetFile);


    /* Elf32_Ehdr.e_shnum specifies how many sections there are */
    elfShdr = calloc(elfEhdr.e_shnum, sizeof(*elfShdr));
    assert(elfShdr);

    /* set the file pointer to the section header offset and read it */
    fseek(targetFile, elfEhdr.e_shoff, SEEK_SET);
    fread(elfShdr, sizeof(*elfShdr), elfEhdr.e_shnum, targetFile);


    /* loop through every section */
    for(i = 0; (unsigned int)i < elfEhdr.e_shnum; i++)
    {


        /* if Elf32_Shdr.sh_addr isn't 0 the section will appear in memory*/
        if(elfShdr[i].sh_addr)
        {

            /* set the file pointer to the location of the section's name and then read the name */ 
            fseek(targetFile, elfShdr[elfEhdr.e_shstrndx].sh_offset + elfShdr[i].sh_name, SEEK_SET);
            fgets(tempBuf, sizeof(tempBuf), targetFile);

            #if defined(DEBUG)
            dumpShdrInfo(elfShdr[i], tempBuf);
            #endif
        }
    }

    fclose(targetFile);
    free(elfShdr);
}

return ret;
}

int main(int argc, char *argv[])
{
if(argc > 1)
{
    search(argv[1]);
}
return 0;
}

After running it a few times on a couple files I noticed something weird. The '.text' section always began at a very low virtual address (we're talking smaller than 1000h). After digging around with gdb for a while, I noticed that for every section, sh_addr was equal to sh_offset.

This is what I'm confused about - Elf32_Shdr.sh_addr is documented as being "the address at which the first byte should reside", while Elf32_Shdr.sh_offset is documented as being "the byte offset from the beginning of the file to the first byte in the function". If those are both the case, it doesn't really make sense to me that they're both equal. Why is this?

Now, I know there are sections that contain uninitialized data (.bss I think), so it would make sense that that data would not appear in the file but would appear in the process's memory. This would mean that for every section that comes after the aforementioned one, figuring out it's virtual address would be a lot more complicated than a simple variable.

That being said, is there a way of actually determining a section's virtual address?

Any kind of help is appreciated; thanks a lot.

+1  A: 

I tried that and Elf32_Shdr.sh_addr isn't the same as Elf32_Shdr.sh_offset in my example. It is shifted by 0x08040000, which is the virtual start address of the program in memory. Elf32_Shdr.sh_offset is 0x00000570 for the '.text' section and Elf32_Shdr.sh_addr is 0x08048570 for the same section.

Like you quoted from the documentation Elf32_Shdr.sh_offset is "the byte offset from the beginning of the file to the first byte in the function":

$> hexdump -C -s 0x00000570 -n 64 elffile
00000570  31 ed 5e 89 e1 83 e4 f0  50 54 52 68 b0 88 04 08  |1.^.....PTRh....|
00000580  68 c0 88 04 08 51 56 68  66 88 04 08 e8 3b ff ff  |h....QVhf....;..|
00000590  ff f4 90 90 90 90 90 90  90 90 90 90 90 90 90 90  |................|
000005a0  55 89 e5 83 ec 08 80 3d  44 a0 04 08 00 74 0c eb  |U......=D....t..|

and Elf32_Shdr.sh_addr is "the address at which the first byte should reside". That is the virtual adress of the data in the memory:

(gdb) print/x *(char[64] *) 0x08048570
$4 = {
0x31, 0xed, 0x5e, 0x89, 0xe1, 0x83, 0xe4, 0xf0, 0x50, 0x54, 0x52, 0x68, 0xb0, 0x88, 0x04, 0x08,
0x68, 0xc0, 0x88, 0x04, 0x08, 0x51, 0x56, 0x68, 0x66, 0x88, 0x04, 0x08, 0xe8, 0x3b, 0xff, 0xff,
0xff, 0xf4, 0x90 <repeats 14 times>,
0x55, 0x89, 0xe5, 0x83, 0xec, 0x08, 0x80, 0x3d, 0x44, 0xa0, 0x04, 0x08, 0x00, 0x74, 0x0c, 0xeb}
rudi-moore
Are you using the code I pasted? If not, could you point out what I'm doing wrong? The output I get is here http://pastebin.com/qTPG85sT
masseyc
err, disregard that. http://stackoverflow.com/questions/3091363/question-about-the-elf-file-format-sh-addr-always-equals-sh-offset/3098927#3098927
masseyc
A: 

Okay, after taking a look at rudi-moore's answer I thought I'd investigate with gdb one more time...

It turns out in my dumpShdrInfo I was printing sh_offset instead of sh_addr. I have vivid memories of writing that function and typing out "sh_addr", as well as debugging with gdb and seeing sh_offset being equal to sh_addr.

However, I guess I'm an idiot and my memories aren't worth that much because as soon as I changed it to sh_addr and recompiled it worked. That's what I get for programming at 5AM. :/

masseyc
Yes that 5AM-code, i can well understand it ;) Fine that it works now.
rudi-moore