views:

159

answers:

5

How to get the entire code of a method in memory so I can calculate its hash at runtime?

I need to make a function like this:

type
  TProcedureOfObject = procedure of object;

function TForm1.CalculateHashValue (AMethod: TProcedureOfObject): string;
var
  MemStream: TMemoryStream;
begin
  result:='';

  MemStream:=TMemoryStream.Create;

  try
    //how to get the code of AMethod into TMemoryStream?

    result:=MD5(MemStream); //I already have the MD5 function
  finally
    MemStream.Free;
  end;
end;

I use Delphi 7.


Edit: Thank you to Marcelo Cantos & gabr for pointing out that there is no consistent way to find the procedure size due to compiler optimization. And thank you to Ken Bourassa for reminding me of the risks. The target procedure (the procedure I would like to compute the hash) is my own and I don't call another routines from there, so I could guarantee that it won't change.

After reading the answers and Delphi 7 help file about the $O directive, I have an idea.

I'll make the target procedure like this:

procedure TForm1.TargetProcedure(Sender: TObject);
begin
{$O-}

  //do things here

  asm
    nop;
    nop;
    nop;
    nop;
    nop;
  end;
{$O+}
end;

The 5 succesive nops at the end of the procedure would act like a bookmark. One could predict the end of the procedure with gabr's trick, and then scan for the 5 nops nearby to find out the hopefully correct size.

Now while this idea sounds worth trying, I...uhm... don't know how to put it into working Delphi code. I have no experience on lower level programming like how to get the entry point and put the entire code of the target procedure into a TMemoryStream while scanning for the 5 nops.

I'd be very grateful if someone could show me some practical examples.

+2  A: 

You might struggle with this. Functions are defined by their entry point, but I don't think that there is any consistent way to find out the size. In fact, optimisers can do screwy things like merge two similar functions into a common shared function with multiple entry points (whether or not Delphi does stuff like this, I don't know).

EDIT: The 5-nop trick isn't guaranteed to work either. In addition to Remy's caveats (see his comment below), The compiler merely has to guarantee that the nops are the last thing to execute, not that they are last thing to appear in the function's binary image. Turning off optimisations is a rather baroque "solution" that still won't fix all the issues that others have raised.

In short, there are simply too many variables here for what you are trying to do. A better approach would be to target compilation units for checksumming (assuming it satisfies whatever overall objective you have).

Marcelo Cantos
Not that I know.
Marco van de Voort
In addition to what Marcelo said, something else to keep in mind that 'if' statement and loops, basically anything with an inner scope (especially if they have a lot of code in them, or inner branches of their own), may be separated out into their own blocks of memory that get scattered throughout the executable's process. Functions are rarely self-contained in a single memory block.
Remy Lebeau - TeamB
+2  A: 

Marcelo has correctly stated that this is not possible in general.

The usual workaround is to use an address of the method that you want to calculate the hash for and an address of the next method. For the time being the compiler lays out methods in the same order as they are defined in the source code and this trick works.

Be aware that substracting two method addresses may give you a slightly too large result - the first method may actually end few bytes before the next method starts.

gabr
+2  A: 

The only way I can think of, is turning on TD32 debuginfo, and try JCLDebug to see if you can find the length in the debuginfo using it. Relocation shouldn't affect the length, so the length in the binary should be the same as in mem.

Another way would be to scan the code for a ret or ret opcode. That is less safe, but probably would guard at least part of the function, without having to mess with debuginfo.

The potential deal breaker though is short routines that are tail-call optimized (iow they jump instead of ret). But I don't know if Delphi does that.

Marco van de Voort
A: 

Even if you would achieve it, there is a few things you need to be aware of...

The hash will change many times, even if the function itself didn't change.

For example, the hash will change if your function call another function that changed address since the last build. I think the hash might also change if your function calls itself recursively and your unit (not necessarily your function) changed since the last build.

As for how it could be achieved, gabr's suggestion seems to be the best one... But it's really prone to break over time.

Ken Bourassa
+1  A: 

I achieve this by letting Delphi generate a MAP-file and sorting symbols based on their start address in ascending order. The length of each procedure or method is then the next symbols start address minus this symbols start address. This is most likely as brittle as the other solutions suggested here but I have this code working in production right now and it has worked fine for me so far.

My implementation that reads the map-file and calculate sizes can be found here at line 3615 (TEditorForm.RemoveUnusedCode).

Ville Krumlinde