views:

418

answers:

1

I'm writing a little PE reader, so I run dumpbin alongside my test application to confirm that the values are being read correctly. Everything it working so far, except for the export table.

The file I'm testing with is a DLL. My application reads the file in as a byte array, which gets passed to my PE reader class. The values align with those output by dumpbin, including the RVA and size of the export data directory.

        E000 [     362] RVA [size] of Export Directory

The problem is, the byte array's size is only 42,496. As you can probably imagine, when my PE reader attempts to read at E000 (57,344), I get an IndexOutOfRangeException. dumpbin, however, has no such problem and reads the export directory just fine. And yes, the entire file is indeed being read into the byte array.

How is this possible?

+4  A: 

The PE file contains "sections", and the sections have independent base addresses. The PE is not a contiguous memory image. Each section is a contiguous memory image.

First you will have to read the section information and make memory-map of their layout. Then you will be able to align the section offsets with the file-based offsets.

As an aside, consider looking at OllyDbg, which is a freeware, open-source debugger and disassembler for Windows. It will possibly help you test your own software, and might server the very purpose you are trying to fill by "rolling your own."

Example from dumpbin /all output:

SECTION HEADER #1
   .text name
    BC14 virtual size
    1000 virtual address (00401000 to 0040CC13)
    BE00 size of raw data
     400 file pointer to raw data (00000400 to 0000C1FF)
       0 file pointer to relocation table
       0 file pointer to line numbers
       0 number of relocations
       0 number of line numbers
60000020 flags
         Code
         Execute Read

In this case, my .text section begins at RVA 1000 and extends to RVA CE00. The file pointer to this section is 400. I can translate-to-file-pointer any RVAs in the range 1000-CDFF by the work of subtracting 600. (All numeric values hexadecimal.)

Whenever you encounter an "RVA" (Relative Virtual Address), you resolve it to a file offset (or an index into your byte array), using this method:

  1. Determine to which section the RVA belongs. Each section contains the RVAs from its virtual address through its size. Sections do not overlap.
  2. Subtract the section virtual address from the RVA -- this gives you the offset relative to the section.
  3. Add the section's PointerToRawData to the offset you obtain in step (2). This is the file offset corresponding to the RVA.

Another approach that you might use is to call MapViewOfFileEx() with the flag FILE_MAP_EXECUTE set in dwDesiredAccess argument. This API will parse the section headers from the PE file, and read the contents of the sections into their locations relative to the "module base."

The module base is the base address at which the PE header will be loaded. When loading DLLs using LoadLibrary() functions, this can be obtained via GetModuleInformation() function's MODULEINFO member lpBaseOfDll.

When using MapViewOfFileEx(), the module base is simply the return value from MapViewOfFileEx().

In the setting of loading the module in these ways, resolving the RVA to a normal pointer value is a matter of:

  1. Store the module base address in a char *
  2. Add the RVA to the char *
  3. Cast the char * to the actual datatype and dereference that.

A drawback of letting the OS map the file as in these approaches is that if you are using this tool to investigate some suspect file and are not sure if a developer has taken strange liberties with the section headers, it is possible you miss some valuable information by letting the OS handle this part of the parsing.

Heath Hunnicutt
Apologies, as I'm learning as I go... So far, I've added a boolean parameter that tells the PE reader if the given data is memory mapped (which I set to false when reading from a file). If it's not memory mapped, I read the export directory at the address given by PointerToRawData in the .edata section. This time, however, AddressOfFunctions points to E000. What information do I need to translate that to a file offset? Or have I missed something?
David Brown
Do you memory map it with MapViewOfFileEx() and pass FILE_MAP_EXECUTE? Or did you load it with LoadLibraryEx() and LOAD_LIBRARY_AS_IMAGE_RESOURCE?
Heath Hunnicutt
Neither. I'm simply reading bytes in from a file on the disk. I would like to support both PE images on the disk and loaded into memory, so I've added the boolean parameter to specify which type of image it is. If `MemoryMapped` is false, I know the image was read from disk and I can translate the export addresses accordingly. I just don't know what information I need to do the translation.
David Brown
I see. AddressOfFunctions is also an "RVA". See the paragraph I added to my answer.
Heath Hunnicutt
I've taken a look at MapViewOfFileEx since you mentioned it and it looks very useful in this situation. Will that particular function automatically map the export directory to the RVA given by its VirtualAddress so I don't have to translate anything at all?
David Brown
It won't map it to the RVA but to what I guess you would call the "VA". By "VA" I mean the real Virtual Address at which the entire module is loaded. For example, if your DLL loads at module base 04000000, then RVA 1000 will become 04001000. However, MapViewOfFileEx is IMO the way to go -- instead of dealing with all that section data, you only have to keep track of one value, the module base address. Keep in mind that if you want your tool to be useful in cracking malware or obfuscated binaries, relying on the OS APIs might leave you with a blindspot. In such case: parse section headers.
Heath Hunnicutt
Cracking malware or obfuscated binaries isn't my intention at the moment, so I'll look into MapViewOfFileEx some more. Thanks for the help!
David Brown
Hey my pleasure and I'm glad our saved answer got better from the thread. :)
Heath Hunnicutt