views:

134

answers:

3

A friend's server (yes, really. Not mine.) was broken into and we discovered a perl binary running some bot code. We could not find the script itself (probably eval'ed as received over the network), but we managed to create a core dump of the perl process.

Running strings on the core gave us some hints (hostnames, usernames / passwords), but not the source code of the script.

We'd like to know what the script was capable of doing, so we'd like to reverse-engineer the perl code that was running inside that perl interpreter.

Searching around, the closest thing to a perl de-compiler I found is the B::Deparse module which seems to be perfectly suitable for converting the bytecode of the parse-trees back into readable code.

Now, how do I get B::Deparse to operate on a core dump? Or, alternatively, how could I restart the program from the core, load B::Deparse and execute it?

Any ideas are welcome.

A: 

Well, undump will turn that core dump back into a binary executable (if you can find a working version). You should then be able to load that into perl and -MO=Deparse it.

Pedro Silva
Err, I think there's a flaw there; how do you load a binary executable into perl?
ysth
I was under the impression I'd just done `perl a.out` one of these days following a Par::Packer tutorial, but I tried it now and it didn't work.
Pedro Silva
+2  A: 

I doubt there's a tool out there that does this out of the box, so...

  1. Find the source code to the version of perl you were running. This should help you understand the memory layout of the perl interpreter. It will also help you figure out if there's a way to take a shortcut here (e.g. if bytecode is preceded by an easy to find header in memory or something).

  2. Load up the binary + core dump in a debugger, probably gdb

  3. Use the information in the perl source code to guide you in convincing the debugger to spit out the bytecode you're interested in.

Once you have the bytecode, B::Deparse should be able to get you to something more readable.

blucz
Yes, that sounds reasonable. There might be a catch, though: The perl documentation refers to the byte code as parse-tree. This suggests that the byte-code is not an array of op-codes that is independent of its address in memory. Instead, this sound like some sort of tree with pointers and perhaps even pointers towards code implementing primitives.I'm thus not sure whether B::Deparse can cope with a parse-tree that was generated by a different perl instance.
otmar
Looking at the Deparse code shows another problem:Deparse oeprated on the perl objects representing the program, and not the memory/byte-stream that stores that code.
otmar
+4  A: 

ysth asked me on IRC to comment on your question. I've done a whole pile of stuff "disassembling" compiled perl and stuff (just see my CPAN page [http://search.cpan.org/~jjore]).

Perl compiles your source to a tree of OP* structs which occasionally have C pointers to SV* which are perl values. Your core dump now has a bunch of those OP* and SV* stashed.

The best possible world would be to have a perl module like B::Deparse do the information-understanding work for you. It works by using a light interface to perl memory in the B::OP and B::SV classes (documented in B, perlguts, and perlhack). This is unrealistic for you because a B::* object is just a pointer into memory with accessors to decode the struct for our use. Consider:

require Data::Dumper;
require Scalar::Util;
require B;

my $value = 'this is a string';

my $sv      = B::svref_2object( \ $value );
my $address = Scalar::Util::refaddr( \ $value );

local $Data::Dumper::Sortkeys = 1;
local $Data::Dumper::Purity   = 1;
print Data::Dumper::Dumper(
  {
    address => $address,
    value   => \ $value,
    sv      => $sv,
    sv_attr => {
      CUR           => $sv->CUR,
      LEN           => $sv->LEN,
      PV            => $sv->PV,
      PVBM          => $sv->PVBM,
      PVX           => $sv->PVX,
      as_string     => $sv->as_string,
      FLAGS         => $sv->FLAGS,
      MAGICAL       => $sv->MAGICAL,
      POK           => $sv->POK,
      REFCNT        => $sv->REFCNT,
      ROK           => $sv->ROK,
      SvTYPE        => $sv->SvTYPE,
      object_2svref => $sv->object_2svref,
    },
  }
);

which when run showed that the B::PV object (it is ISA B::SV) is truely merely an interface to the memory representation of the compiled string this is a string.

$VAR1 = {
          'address' => 438506984,
          'sv' => bless( do{\(my $o = 438506984)}, 'B::PV' ),
          'sv_attr' => {
                         'CUR' => 16,
                         'FLAGS' => 279557,
                         'LEN' => 24,
                         'MAGICAL' => 0,
                         'POK' => 1024,
                         'PV' => 'this is a string',
                         'PVBM' => 'this is a string',
                         'PVX' => 'this is a string',
                         'REFCNT' => 2,
                         'ROK' => 0,
                         'SvTYPE' => 5,
                         'as_string' => 'this is a string',
                         'object_2svref' => \'this is a string'
                       },
          'value' => do{my $o}
        };
$VAR1->{'value'} = $VAR1->{'sv_attr'}{'object_2svref'};

This however implies that any B::* using code must actually operate on live memory. Tye McQueen thought he remembered a C debugger which could fully revive a working process given a core dump. My gdb can't. gdb can allow you to dump the contents of your OP* and SV* structs. You would most likely just read the dumped structs to interpret your program's structure. You could, if you wished, use gdb to dump the structs, then synthetically create B::* objects which behaved in interface as if they were ordinary and use B::Deparse on that. At root, our deparser and other debug dumping tools are mostly object oriented so you could just "fool" them by creating a pile of fake B::* classes and objects.

You may find reading the B::Deparse class's coderef2text method instructive. It accepts a function reference, casts it to a B::CV object, and uses that for input to the deparse_sub method:

require B;
require B::Deparse;
sub your_function { ... }

my $cv = B::svref_2object( \ &your_function );
my $deparser = B::Deparse->new;
print $deparser->deparse_sub( $cv );

For gentler introductions to OP* and related ideas, see the updated PerlGuts Illustrated and Optree guts.

Josh Jore
Josh,thanks for the detailed answer. This pretty much jives with what I expected. Looks like a project for long winter nights.
otmar