ansaurus

Question

Reverse Engineering a Perl script based on a core dump

Answer 1

A:

Well, undump will turn that core dump back into a binary executable (if you can find a working version). You should then be able to load that into perl and -MO=Deparse it.

Pedro Silva 2010-08-26 16:08:28

Err, I think there's a flaw there; how do you load a binary executable into perl?

ysth 2010-08-26 16:48:51

I was under the impression I'd just done `perl a.out` one of these days following a Par::Packer tutorial, but I tried it now and it didn't work.

Pedro Silva 2010-08-26 17:19:38

Answer 2

+2 A:

I doubt there's a tool out there that does this out of the box, so...

Find the source code to the version of perl you were running. This should help you understand the memory layout of the perl interpreter. It will also help you figure out if there's a way to take a shortcut here (e.g. if bytecode is preceded by an easy to find header in memory or something).
Load up the binary + core dump in a debugger, probably gdb
Use the information in the perl source code to guide you in convincing the debugger to spit out the bytecode you're interested in.

Once you have the bytecode, B::Deparse should be able to get you to something more readable.

blucz 2010-08-26 16:14:05

Yes, that sounds reasonable. There might be a catch, though: The perl documentation refers to the byte code as parse-tree. This suggests that the byte-code is not an array of op-codes that is independent of its address in memory. Instead, this sound like some sort of tree with pointers and perhaps even pointers towards code implementing primitives.I'm thus not sure whether B::Deparse can cope with a parse-tree that was generated by a different perl instance.

otmar 2010-08-27 14:59:10

Looking at the Deparse code shows another problem:Deparse oeprated on the perl objects representing the program, and not the memory/byte-stream that stores that code.

otmar 2010-08-27 15:02:52

Answer 3

+4 A:

ysth asked me on IRC to comment on your question. I've done a whole pile of stuff "disassembling" compiled perl and stuff (just see my CPAN page [http://search.cpan.org/~jjore]).

Perl compiles your source to a tree of OP* structs which occasionally have C pointers to SV* which are perl values. Your core dump now has a bunch of those OP* and SV* stashed.

The best possible world would be to have a perl module like B::Deparse do the information-understanding work for you. It works by using a light interface to perl memory in the B::OP and B::SV classes (documented in B, perlguts, and perlhack). This is unrealistic for you because a B::* object is just a pointer into memory with accessors to decode the struct for our use. Consider:

require Data::Dumper;
require Scalar::Util;
require B;

my $value = 'this is a string';

my $sv      = B::svref_2object( \ $value );
my $address = Scalar::Util::refaddr( \ $value );

local $Data::Dumper::Sortkeys = 1;
local $Data::Dumper::Purity   = 1;
print Data::Dumper::Dumper(
  {
    address => $address,
    value   => \ $value,
    sv      => $sv,
    sv_attr => {
      CUR           => $sv->CUR,
      LEN           => $sv->LEN,
      PV            => $sv->PV,
      PVBM          => $sv->PVBM,
      PVX           => $sv->PVX,
      as_string     => $sv->as_string,
      FLAGS         => $sv->FLAGS,
      MAGICAL       => $sv->MAGICAL,
      POK           => $sv->POK,
      REFCNT        => $sv->REFCNT,
      ROK           => $sv->ROK,
      SvTYPE        => $sv->SvTYPE,
      object_2svref => $sv->object_2svref,
    },
  }
);

which when run showed that the B::PV object (it is ISA B::SV) is truely merely an interface to the memory representation of the compiled string this is a string.

$VAR1 = {
          'address' => 438506984,
          'sv' => bless( do{\(my $o = 438506984)}, 'B::PV' ),
          'sv_attr' => {
                         'CUR' => 16,
                         'FLAGS' => 279557,
                         'LEN' => 24,
                         'MAGICAL' => 0,
                         'POK' => 1024,
                         'PV' => 'this is a string',
                         'PVBM' => 'this is a string',
                         'PVX' => 'this is a string',
                         'REFCNT' => 2,
                         'ROK' => 0,
                         'SvTYPE' => 5,
                         'as_string' => 'this is a string',
                         'object_2svref' => \'this is a string'
                       },
          'value' => do{my $o}
        };
$VAR1->{'value'} = $VAR1->{'sv_attr'}{'object_2svref'};

This however implies that any B::* using code must actually operate on live memory. Tye McQueen thought he remembered a C debugger which could fully revive a working process given a core dump. My gdb can't. gdb can allow you to dump the contents of your OP* and SV* structs. You would most likely just read the dumped structs to interpret your program's structure. You could, if you wished, use gdb to dump the structs, then synthetically create B::* objects which behaved in interface as if they were ordinary and use B::Deparse on that. At root, our deparser and other debug dumping tools are mostly object oriented so you could just "fool" them by creating a pile of fake B::* classes and objects.

You may find reading the B::Deparse class's coderef2text method instructive. It accepts a function reference, casts it to a B::CV object, and uses that for input to the deparse_sub method:

require B;
require B::Deparse;
sub your_function { ... }

my $cv = B::svref_2object( \ &your_function );
my $deparser = B::Deparse->new;
print $deparser->deparse_sub( $cv );

For gentler introductions to OP* and related ideas, see the updated PerlGuts Illustrated and Optree guts.

Josh Jore 2010-08-30 21:01:34

Josh,thanks for the detailed answer. This pretty much jives with what I expected. Looks like a project for long winter nights.

otmar 2010-09-02 14:28:33

ansaurus

tags:

views:

answers:

Reverse Engineering a Perl script based on a core dump

related questions