The PE file contains "sections", and the sections have independent base addresses. The PE is not a contiguous memory image. Each section is a contiguous memory image.
First you will have to read the section information and make memory-map of their layout. Then you will be able to align the section offsets with the file-based offsets.
As an aside, consider looking at OllyDbg, which is a freeware, open-source debugger and disassembler for Windows. It will possibly help you test your own software, and might server the very purpose you are trying to fill by "rolling your own."
Example from dumpbin /all
output:
SECTION HEADER #1
.text name
BC14 virtual size
1000 virtual address (00401000 to 0040CC13)
BE00 size of raw data
400 file pointer to raw data (00000400 to 0000C1FF)
0 file pointer to relocation table
0 file pointer to line numbers
0 number of relocations
0 number of line numbers
60000020 flags
Code
Execute Read
In this case, my .text section begins at RVA 1000 and extends to RVA CE00. The file pointer to this section is 400. I can translate-to-file-pointer any RVAs in the range 1000-CDFF by the work of subtracting 600. (All numeric values hexadecimal.)
Whenever you encounter an "RVA" (Relative Virtual Address), you resolve it to a file offset (or an index into your byte array), using this method:
- Determine to which section the RVA belongs. Each section contains the RVAs from its virtual address through its size. Sections do not overlap.
- Subtract the section virtual address from the RVA -- this gives you the offset relative to the section.
- Add the section's PointerToRawData to the offset you obtain in step (2). This is the file offset corresponding to the RVA.
Another approach that you might use is to call MapViewOfFileEx()
with the flag FILE_MAP_EXECUTE
set in dwDesiredAccess argument. This API will parse the section headers from the PE file, and read the contents of the sections into their locations relative to the "module base."
The module base is the base address at which the PE header will be loaded. When loading DLLs using LoadLibrary()
functions, this can be obtained via GetModuleInformation()
function's MODULEINFO
member lpBaseOfDll.
When using MapViewOfFileEx()
, the module base is simply the return value from MapViewOfFileEx()
.
In the setting of loading the module in these ways, resolving the RVA to a normal pointer value is a matter of:
- Store the module base address in a
char *
- Add the RVA to the
char *
- Cast the
char *
to the actual datatype and dereference that.
A drawback of letting the OS map the file as in these approaches is that if you are using this tool to investigate some suspect file and are not sure if a developer has taken strange liberties with the section headers, it is possible you miss some valuable information by letting the OS handle this part of the parsing.