tags:

views:

1417

answers:

9

Hello

Imagine I have the following simple C program:

int main() {

int a=5, b= 6, c;
c = a +b; 
return 0;
}

Now, I would like to know the address of the expression c=a+b, that is the program address where this addition is carried out. Is there any possibility that I could use printf? Something along the line:

int main() {

int a=5, b= 6, c;
printf("Address of printf instruction in memory: %x", current_address_pointer_or_something)
c = a +b; 
return 0;
}

I know how I could find the address out by using gdb and then info line file.c:line. However, I should know if I could also do that directly with the printf.

Thanks

A: 

You would need to add a line in assembly to get the address pointed to by the instruction pointer..

{
  int addr;
  __asm
  {
     mov eax, [eip] - 4   ; use the value of EIP and subtract 4 to get
                          ; the address of execution of right before the
                          ; the __asm block. store this in the EAX register.
     mov [addr], eax      ; put value from EAX into program variable addr
  }
  // addr should contain the address now
}

Disclaimer: The example above is sort-of pseudo-code and it really depends on your compiler or choice and operating system whether the code will actually work. The Microsoft Visual C++ compiler will be different than say GCC.

The idea is to read the value of the instruction pointer EIP, and put it into a variable in your program.

Miky Dinescu
That didn't compile for me in Visual Studio 2008 - are you sure you can read eip directly like that?
RichieHindle
You can't directly read from EIP in the IA-32 ISA.
Michael
I guess you guys are right. Too much embedded coding for me.. I actually made an assumption and it turns out I was wrong..
Miky Dinescu
+4  A: 

Visual C++ has the _ReturnAddress intrinsic, which can be used to get some info here.

For instance:

__declspec(noinline) void PrintCurrentAddress()
{
    printf("%p", __ReturnAddress);
}

Which will give you an address close to the expression you're looking at. In the event of some optimizations, like tail folding, this will not be reliable.

Michael
+9  A: 

In gcc, you can take the address of a label using the && operator. So you could do this:

int main() 
{
    int a=5, b= 6, c;

    sum:
        c = a+b;

    printf("Address of sum label in memory: %p", &&sum);
    return 0;
}

The result of &&sum is the target of the jump instruction that would be emitted if you did a goto sum. So, while it's true that there's no one-to-one address-to-line mapping in C/C++, you can still say "get me a pointer to this code."

Charlie Tangora
This would appear to be a non-standard extension
1800 INFORMATION
Whoops, you're right. It's gcc only.
Charlie Tangora
The question itself has all sorts of non-standard aspects. It is only fair to provide an answer featuring non-standard extensions.
sigjuice
That looks excellent, I was looking for something along those lines for the gcc compiler! will try it tomorrow ;)
You beat me. nice answer +1 Here is the doc: http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html
Johannes Schaub - litb
Note that this may not correspond to what you think it does - for example, gcc could decide a, b, and c are effectively constants, get rid of all of them, and sum then effectively points to the printf statement.
bdonlan
Also, this may perturb the control-flow graph of the function, which may or may not be desired.
bdonlan
Charlie Tangora
+2  A: 

Tested in Visual Studio 2008:

int addr;
__asm
{
    call _here
    _here: pop eax
    ; eax now holds the PC.
    mov [addr], eax
}

printf("%x\n", addr);

Credit to this question.

RichieHindle
Doing this interferes with the CPU's return address predictor since you modified the return address not via a ret. http://blogs.msdn.com/oldnewthing/archive/2004/12/16/317157.aspx
Michael
Wouldn't it be better just to put in a local label, then load that label's address with an immediate MOV? eg: _here: mov eax, _here (newline) mov [addr], eax
bdonlan
A: 

I don't know the details, but there should be a way to make a call to a function that can then crawl the return stack for the address of the caller, and then copy and print that out.

Will Hartung
A: 

Here's a sketch of an alternative approach:

Assume that you haven't stripped debug symbols, and in particular you have the line number to address table that a source-level symbolic debugger needs in order to implement things like single step by source line, set a break point at a source line, and so forth.

Most tool chains use reasonably well documented debug data formats, and there are often helper libraries that implement most of the details.

Given that and some help from the preprocessor macro __LINE__ which evaluates to the current line number, it should be possible to write a function which looks up the address of any source line.

Advantages are that no assembly is required, portability can be achieved by calling on platform-specific debug information libraries, and it isn't necessary to directly manipulate the stack or use tricks that break the CPU pipeline.

A big disadvantage is that it will be slower than any approach based on directly reading the program counter.

RBerteig
+1  A: 

For x86:

int test()
{
    __asm {
     mov eax, [esp]
    }
}


__declspec(noinline) int main() // or whatever noinline feature your compiler has
{
    int a = 5;
    int aftertest;

    aftertest = test()+3; // aftertest = disasms to 89 45 F8 mov dword ptr [a],eax.

    printf("%i", a+9);
    printf("%x", test());
    return 0;
}
Unknown
A: 

Using gcc on i386 or x86-64:

#include <stdio.h>

#define ADDRESS_HERE() ({ void *p; __asm__("1: mov 1b, %0" : "=r" (p)); p; })

int main(void) {
    printf("%p\n", ADDRESS_HERE());
    return 0;
}

Note that due to the presence of compiler optimizations, the apparent position of the expression might not correspond to its position in the original source.

The advantage of using this method over the &&foo label method is it doesn't change the control-flow graph of the function. It also doesn't break the return predictor unit like the approaches using call :) On the other hand, it's very much architecture-dependent... and because it doesn't perturb the CFG there's no guarantee that jumping to the address in question would make any sense at all.

bdonlan
Are you sure it doesn't change the CFG? My understanding is that an __asm__ block is a scheduler barrier, just like a label would be.
Charlie Tangora
Never mind, I was thinking of `asm volatile`.
Charlie Tangora
A: 

If the compiler is any good this addition happens in registers and is never stored in memory, at least not in the way you are thinking. Actually a good compiler will see that your program does nothing, manipulating values within a function but never sending those values anywhere outside the function can result in no code.

If you were to:

c = a+b; printf("%u\n",c);

Then a good compiler will also never store that value C in memory it will stay in registers, although it depends on the processor as well. If for example compilers for that processor use the stack to pass variables to functions then the value for c will be computed using registers (a good compiler will see that C is always 11 and just assign it) and the value will be put on the stack while being sent to the printf function. Naturally the printf function may well need temporary storage in memory due to its complexity (cant fit everything it needs to do in registers).

Where I am heading is that there is no answer to your question. It is heavily dependent on the processor, compiler, etc. There is no generic answer. I have to wonder what the root of the question is, if you were hoping to probe with a debugger, then this is not the question to ask.

Bottom line, disassemble your program and look at it, for that compile on that day with those settings, you will be able to see where the compiler has placed intermediate values. Even if the compiler assigns a memory location for the variable that doesnt mean the program will ever store the variable in that location. It depends on optimizations.

dwelch