tags:

views:

98

answers:

2

Hi,

I inherited an environment that has a "compiled" perl script on Unix. Is it possible to de-compile, reverse engineer (whatever the term is) it, and obtain the source code from the compiled object code ?

Might not be possible, but thought I'd ask rather than assume.

Thanks, -Kevin.

+4  A: 

Oh my!

If and only if it was compiled into executable byte code via perlcc -B, you could then uncompile it the same way B::Deparse does. You'd get back all of the source that wasn't optimized away that way. It might look a bit funny, but it would be an equivalent program.

However, if it was fully compiled into C code and thence to assembler and machine language and run through ld for a proper a.out file, you aren't going to be able to do anything like that. It'd be like trying to disassemble /bin/cat.

So ok, you could disassemble it, but there's no joy to be had there. Even if you could get out the original, generated C code — which you cannot — it would be virtually unusable.

I suppose you might running strings(1) on it to see whether anything useful got left lying around somewhere permanent, but I wouldn't count on it.

Sorry.

tchrist
+7  A: 

Leaving out the bytecode backend tchrist already covered and only talking about the C backend, all perlcc does is translating the optree of your compiled perl program into a C program, which it then compiles. That C program will, when run, then reconstruct that optree into memory, and basically execute it like perl usually would. The point of that is really just to speed up compile time of regular perl code.

That optree of your program is then available in the PL_main_root global variable. We already have a module called B::Deparse, which is able to consume optrees and turn them into source code that's roughly equivalent to the original code that the optree was compiled from. It happens to have a compile method that returns a coderef that'll, when executed, print the deparse result of PL_main_root.

Also there's the C function Perl_eval_pv, which you can use to evaluate Perl snippets from C space.

$ echo 'print 42, "\\n"' > foo.pl
$ perl foo.pl
42
$ perlcc foo.pl
$ ./a.out
42
$ gdb a.out
...
(gdb) b perl_run
Breakpoint 1 at 0x4570e5: file perl.c, line 2213.
(gdb) r
...
Breakpoint 1, perl_run (my_perl=0xa11010) at perl.c:2213
(gdb) p Perl_eval_pv (my_perl, "use B::Deparse; B::Deparse->compile->()", 1)
print 42, "\n";
$1 = (SV *) 0xe47b10

Of course the usual B::Deparse caveats apply, but this will certainly be handy for reverse-engeneering. Actually reconstructing the original source code won't be possible in most cases, even if it worked for the above example.

The exact gdb magic you'll have to do to get B::Deparse to give you something sensible also depends largely on your perl. I'm using a perl with ithreads, and therefore multiplicity. That's why I'm passing around the my_perl variable. Other perls might not need that. Also, if anyone stripped the binary compiled by perlcc, things will get a bit harder, but the same technique will still work.

Also you can use that to compile any optree you can somehow get ahold of at any time during program execution. Have a look at B::Deparse's compile sub and do something similar, except provide it with a B object for whatever optree you want dumped instead of B::main_root.

The same thing applies to the mentioned bytecode backend of perlcc. I'm not entirely sure about the optimized C backend called CC.

rafl
I was only thinking about the B and CC backends; I'd forgotten about the C backend.
tchrist
Actually, after having a brief look at the CC backend, it seems to be much the same as the C backend. It's similar enough for the above approach to work with it anyway.
rafl