views:

414

answers:

7

I dump my RAM (a piece of it - code segment only) in order to find where is which C function being placed. I have no map file and I don't know what boot/init routines exactly do.

I load my program into RAM, then if I dump the RAM, it is very hard to find exactly where is what function. I'd like to use different patterns build in the C source, to recognize them in the memory dump.

I've tryed to start every function with different first variable containing name of function, like:

char this_function_name[]="main";

but it doesn't work, because this string will be placed in the data segment.

I have simple 16-bit RISC CPU and an experimental proprietary compiler (no GCC or any well-known). The system has 16Mb of RAM, shared with other applications (bootloader, downloader). It is almost impossible to find say a unique sequence of N NOPs or smth. like 0xABCD. I would like to find all functions in RAM, so I need unique identificators of functions visible in RAM-dump.

What would be the best pattern for code segment?

+1  A: 

Numeric constants are placed in the code segment, encoded in the function's instructions. So you could try to use magic numbers like 0xDEADBEEF and so on.

I.e. here's the disassembly view of a simple C function with Visual C++:

void foo(void)
{
00411380  push        ebp  
00411381  mov         ebp,esp 
00411383  sub         esp,0CCh 
00411389  push        ebx  
0041138A  push        esi  
0041138B  push        edi  
0041138C  lea         edi,[ebp-0CCh] 
00411392  mov         ecx,33h 
00411397  mov         eax,0CCCCCCCCh 
0041139C  rep stos    dword ptr es:[edi] 
    unsigned id = 0xDEADBEEF;
0041139E  mov         dword ptr [id],0DEADBEEFh 

You can see the 0xDEADBEEF making it into the function's source. Note that what you actually see in the executable depends on the endianness of the CPU (tx. Richard).

This is a x86 example. But RISC CPUs (MIPS, etc) have instructions moving immediates into registers - these immediates can have special recognizable values as well (although only 16-bit for MIPS, IIRC).


Psihodelia - it's getting harder and harder to catch your intention. Is it just a single function you want to find? Then can't you just place 5 NOPs one after another and look for them? Do you control the compiler/assembler/linker/loader? What tools are at your disposal?

Eli Bendersky
No, it doesn't work in my case. I have no x86 CPU, but simple RISC CPU.
psihodelia
Make sure you look for EFBEADDE on an x86. ;-) Also, don't turn on too many optimizations.
Richard Pennington
@psihodelia: even RISC CPUs have instructions placing immediates into registers.
Eli Bendersky
@Eli: Not always. Sometimes they do a loadhi/loadlo or get the constant from a global table.
Richard Pennington
@Richard: but even with loadhi/loadlo, there are 16-bit immediates that stay whole. so 0xDEAD would have to do :-)
Eli Bendersky
@Eli: I have 16-bit CPU
psihodelia
@psihodelia: worst things worst, a few 8-bit constants in subsequent instructions should also stand out
Eli Bendersky
+6  A: 

If it were me, I'd use the symbol table, e.g. "nm a.out | grep main". Get the real address of any function you want.

If you really have no symbol table, make your own.

struct tab {
    void *addr;
    char name[100];  // For ease of searching, use an array.
} symtab[] = {
    { (void*)main, "main" },
    { (void*)otherfunc, "otherfunc" },
};

Search for the name, and the address will immediately preceed it. Goto address. ;-)

Richard Pennington
No, I have no nm or any other useful tools, because it is very uncommon compiler and CPU.
psihodelia
Nice idea, that symbol table "of your own". ++ to you
Eli Bendersky
It doesn't work, because I can dump only code segment. A binary file is also in proprietary format, I cannot read it.
psihodelia
Does the compiler put const stuff in the code segment? If so, you could make the symbol table const to access it. I'm not saying to look at the binary file. Search for the string "main" in the processor address space.
Richard Pennington
@Richard: no, it puts all consts into special CONST segment
psihodelia
@psihodelia: you can't dump const memory? Can only dump code - not data? No map/symbol table? You can't examine the binary, even on your workstation? Man, someone is really tying your hands on this project. I think your team needs to maybe start working on some tools or improving the existing ones. I'm not sure how one can effectively troubleshoot if there's no visibility into the system.
Michael Burr
@psihodelia: Presumably you have access to an assembler - use that to build the symbol table Richard described, either at build/link time (if possible), or by copying the one in the `CONST` data segment to a block in the code segment in RAM or by providing a pointer to such a block that a C function can copy the table to.
Michael Burr
+3  A: 

If your compiler has inline asm you can use it to create a pattern. Write some NOP instructions which you can easily recognize by opcodes in memory dump:

MOV r0,r0
MOV r0,r0
MOV r0,r0
MOV r0,r0
Sergius
To be on the safe side, put an unconditional jump instruction at the start of the assembly block that will skip the whole thing. That way, you can put whatever you want inside it (I usually rig up opcodes that when dumped end up as ascii values that spell out something) without worrying about altering the program execution. Oh, and anytime you're doing anything like this, make sure you turn off all compiler optimizations (although some compilers turn them off automatically for functions with inline assembly).
bta
Thanks for the trick with unconditional jump! About inline assembly: I think it is used for manually optimizations and should never be optimized by compilers.
Sergius
+1  A: 

As you noted, this:

char this_function_name[]="main";

... will end up setting a pointer in your stack to a data segment containing the string. However, this:

char this_function_name[]= { 'm', 'a', 'i', 'n' };

... will likely put all these bytes in your stack so you will be able to recognize the string in your code (I just tried it on my platform).

Hope this helps

figurassa
No, frankly it goes into data segment in my case.
psihodelia
@psihodelia Wow... I tried in two different platforms both using GCC as well and it worked in both cases. However, I made sure I had no optimization turned on. I am not sure whether GCC would optimize such a construct. Are you building w/o any optimizations?
figurassa
+1  A: 

How about a completely different approach to your real problem, which is finding a particular block of code: Use diff.

Compile the code once with the function in question included, and once with it commented out. Produce RAM dumps of both. Then, diff the two dumps to see what's changed -- and that will be the new code block. (You may have to do some sort of processing of the dumps to remove memory addresses in order to get a clean diff, but the order of instructions ought to be the same in either case.)

Brooks Moses
+1  A: 

Why not get each function to dump its own address. Something like this:

void* fnaddr( char* fname, void* addr )
{
    printf( "%s\t0x%p\n", fname, addr ) ;
    return addr ;
}


void test( void )
{
    static void* fnaddr_dummy = fnaddr( __FUNCTION__, test ) ;
}

int main (int argc, const char * argv[]) 
{
    static void* fnaddr_dummy = fnaddr( __FUNCTION__, main ) ;
    test() ;
    test() ;
}

By making fnaddr_dummy static, the dump is done once per-function. Obviously you would need to adapt fnaddr() to support whatever output or logging means you have on your system. Unfortunately, if the system performs lazy initialisation, you'll only get the addresses of the functions that are actually called (which may be good enough).

Clifford
A: 

You could start each function with a call to the same dummy function like:

void identifyFunction( unsigned int identifier) { }

Each of your functions would call the identifyFunction-function with a different parameter (1, 2, 3, ...). This will not give you a magic mapfile, but when you inspect the code dump you should be able to quickly find out where the identifyFunction is because there will be lots of jumps to that address. Next scan for those jump and check before the jump to see what parameter is passed. Then you can make your own mapfile. With some scripting this should be fairly automatic.

Ron