views:

67

answers:

3

I'd like to inspect and manipulate code of arbitrary Perl procedures (got by coderefs) in Perl. Is there a tool/module/library for that? Something similar to B::Concise, except that B::Concise prints the code on output, but I'd like to inspect it programmatically.

I'd like to use it like this. Given a coderef F, which is called eg. with 10 arguments:

$ret = &$F(x1, x2, x3, x4, x5, x6, x7, x8, x9, x10);

I'd like to create a function F1, st.

&$F(x1, x2, x3, x4, x5, x6, x7, x8, x9, x10) == 
  &$F1(x1, x2, x3, x4, x5, x6, x7, x8, x9, x10)*
  &$C(x2, x3, x4, x5, x6, x7, x8, x9, x10)

that is to "factor" it into two parts, where the second doesn't depend on x1 and the first is as simple as possible (I assume F is constructed as a huge product).

The application I want this for is optimization of Metropolis sampling algorithm - suppose I'm sampling the distribution p(x1 | x2 = X1, x3 = X3, ...) = f(x1, x2, x3, ...). The algorithm itself is invariant wrt. multiplicative constant factors, and other variables do not change through the algorithms, so the part not depending on x1 (ie. $c from above) need not be evaluated at all).

The joint probability might have eg. the following form:

  p(x1, x2, x3, x4, x5) = g1(x1, x2)*g2(x2, x3)*g3(x3, x4)*g4(x4, x5)*g5(x4, x1)*g6(x5, x1)

I also consider constructing p as an object consisting of the factors with annotations of which variables does a particular factor depend on. Even this would benefit from code introspection (determining the variables automatically).

A: 

Perl 5 does not let you manipulate the bytecode on the fly like that, but you can create anonymous functions. If I understand your example correctly, and I doubt I do, you already have two functions that are being referenced by $f1 and $c, and you want to create a new reference $f that holds the results of the first two multiplied by each other. This is simple:

my $f = sub { $f1->(@_) * $c->(@_[1 .. 9]) };

$f->(1, 2, 3, 4, 5, 6, 7, 8, 9, 10);

Note the use of the arrow operator rather than the & to dereference the coderefs. This style is much more common (and in my opinion more readable).

Chas. Owens
Actually, I want the reverse - given `f`, determine what is `f1`.
jpalecek
+7  A: 

For introspection of optrees the B family of modules is usually used.

Given an code reference $cv, first create a B object for that:

my $b_cv = B::svref_2object($cv);

Now you can call the various methods documented in B on that to retrieve various things from the optree.

Using only optree introspection you can already achieve amazing things. See DBIx::Perlish for a pretty advanced example of this.

There also happens to be a B::Generate module, that allows building new optrees that do whatever you want, or to manipulate existing optrees. However, B::Generate isn't as mature as one would hope, and there's a lot of missing features and quite a few bugs.

Actual optree creation and manipulation is usually best done using perl's C api, as documented in perlapi, perlguts, and perlhack, among others. You'll probably have to learn some XS as well, to expose the optree manipulation functions you wrote back to perl space, but that's the easy part really.

Building optrees (not necessarily based on other existing optrees that are being introspected) seems to have become somewhat popular recently, especially since Syntax Plugins have been added to the core in perl 5.12.0. You can find various examples like Scope::Escape::Sugar on cpan.

However, dealing with perl's optrees is still somewhat fiddly and not exactly beginner-friendly. It shouldn't be necessary for any of the most arcane things. Something like using B::Deparse->new->coderef2text($cv) and then maybe mangling very slightly with the evaluated source code is really as far as I would want to go with optree introspection from pure-perl space.

You might want to step back a bit and explain the actual problem you're trying to solve. Maybe there's a much simpler solution that doesn't involve messing with optrees at all.

rafl
See edit for motivation.
jpalecek
+1 nice answer, and `Scope::Escape::*` looks very interesting. Any other good ones you recommend?
Eric Strom
Thank you. Though, unfortunately, that didn't help me much in understanding your real problem, and that's entirely my fault - your clarifications seem good for someone with the right background. So, lacking any suggestions on how to better approach your issue, I'd be happy to help you introspect whatever code you're faced with. But for that, you'd have to show actual code.
rafl
I can't think of any other syntax plugin users on CPAN right now. However, optree munging in general is relatively common. You might find Parse::Perl, and many of the `B::Hooks::OP::Check` dependants interesting. A prior attempt to do what syntax plugins now provide is Devel::Declare. You'll also find a lot of interesting modules providing mostly new syntax, but also new semantics, based on that.
rafl
@Eric Strom: Another example module might be [`Text::Xslate`](http://search.cpan.org/dist/Text-Xslate/). I believe this compiles a template straight down to opcode.
draegtun
+1  A: 

Given your restated question -- I think what you should do here, instead of trying to munge coderefs, is to delay having a coderef as long as possible.

  1. Create an object representing an instance of your computation.
  2. Write the methods on this object needed to evaluate the value of the computation. No codegen, just do it the long slow way. This is just to give you a baseline of code for the next steps that's easily tested and hopefully easily understood.
  3. Write tests to ensure the correctness of what you did in Step 2. (Swap this before Step 2 if you're that kind of person.)
  4. Implement what you're asking about in this question, by writing methods to transform a computation object into a new one that represents a more-optimized form of the same computation. Use your tests to ensure that computations still give the right result after optimization.
  5. Write code that takes a computation object, and generates a sub (whether by string eval or using B) that carries out that computation. Use your tests to ensure that computations still give the right result after they've been compiled.

Optional step to insert anywhere between 2 and 5:

  • Write some syntactic sugar (probably using overload, but other tools are possible too) to let you construct "computation objects" using nice-looking expressions that resemble the computation itself, instead of lots and lots of object constructors.
hobbs