views:

135

answers:

5

I am wondering if there is any way to get some meta information about the interpretation of a python statement during execution.

Let's assume this is a complex statement of some single statements joined with or (A, B, ... are boolean functions)

if A or B and ((C or D and E) or F) or G and H:

and I want to know which part of the statement is causing the statement to evaluate to True so I can do something with this knowledge. In the example, there would be 3 possible candidates:

A
B and ((C or D and E) or F)
G and H

And in the second case, I would like to know if it was (C or D and E) or F that evaluated to True and so on...

Is there any way without parsing the statement? Can I hook up to the interpreter in some way or utilize the inspect module in a way that I haven't found yet? I do not want to debug, it's really about knowing which part of this or-chain triggered the statement at runtime.

Edit - further information: The type of application that I want to use this in is a categorizing algorithm that inputs an object and outputs a certain category for this object, based on its attributes. I need to know which attributes were decisive for the category.
As you might guess, the complex statement from above comes from the categorization algorithm. The code for this algorithm is generated from a formal pseudo-code and contains about 3,000 nested if-elif-statements that determine the category in a hierarchical way like

if obj.attr1 < 23 and (is_something(obj.attr10) or eats_spam_for_breakfast(obj)):
    return 'Category1'
elif obj.attr3 == 'Welcome Home' or count_something(obj) >= 2:
    return 'Category2a'
elif ...

So aside from the category itself, I need to flag the attributes that were decisive for that category, so if I'd delete all other attributes, the object would still be assigned to the same category (due to the ors within the statements). The statements can be really long, up to 1,000 chars, and deeply nested. Every object can have up to 200 attributes.

Thanks a lot for your help!

Edit 2: Haven't found time in the last two weeks. Thanks for providing this solution, it works!

+1  A: 

As far as I remember, Python does not return True or False per se:

Important exception: the Boolean operations or and and always return one of their operands.

The Python Standard Library - Truth Value Testing
Therefore, following is valid:

A = 1
B = 0
result = B or A # result == 1
artdanil
A: 

"""I do not want to debug, it's really about knowing which part of this or-chain triggered the statement at runtime.""": you might need to explain what is the difference between "debug" and "knowing which part".

Do you mean that you the observer need to be told at runtime what is going on (why??) so that you can do something different, or do you mean that the code needs to "know" so that it can do something different?

In any case, assuming that your A, B, C etc don't have side effects, why can't you simply split up your or-chain and test the components:

part1 = A
part2 = B and ((C or D and E) or F)
part3 = G and H
whodunit = "1" if part1 else "2" if part2 else "3" if part3 else "nobody"
print "Perp is", whodunit
if part1 or part2 or part3:
    do_something()

??

Update:

"""The difference between debug and 'knowing which part' is that I need to assign a flag for the variables that were used in the statement that first evaluated to True (at runtime)"""

So you are saying that given the condition "A or B", that if A is True and B is True, A gets all the glory (or all the blame)? I'm finding it very hard to believe that categorisation software such as you describe is based on "or" having a short-circuit evaluation. Are you sure that there's an intent behind the code being "A or B" and not "B or A"? Could the order be random, or influenced by the order that the variables where originally input?

In any case, generating Python code automatically and then reverse-engineering it appears to be a long way around the problem. Why not just generate code with the part1 = yadda; part2 = blah; etc nature?

John Machin
This is probably the right answer -- breaking that long if-condition into bite-sized pieces makes your code nicer and makes the individual parts available later, as you need them. It defeats the short-circuiting behavior of `or`, but that's often ok.
Jason Orendorff
The difference between debug and 'knowing which part' is that I need to assign a flag for the variables that were used in the statement that first evaluated to True (at runtime).Right now I consider parsing the statement and breaking it up to evaluate part by part as you did it (if I find no other way) - thanks.
Lucas
Hi John, unfortunately thats the way it is. The software specification explictly says that the order plays a role and that the first to-true-evaluating statement gets the glory. Since the or-statements can be nested and expression can get very long, up to 1,000 chars, and I have about 3,000 of them, I am not too keen to generate the code into the partX form. Sorry I can not reveal more details about it, this really is a crazy project, makes me cry once every day.
Lucas
A: 

The Python interpreter doesn't give you a way to introspect the evaluation of an expression at runtime. The sys.settrace() function lets you register a callback that is invoked for every line of source code, but that's too coarse-grained for what you want to do.

That said, I've experimented with a crazy hack to have the function invoked for every bytecode executed: Python bytecode tracing.

But even then, I don't know how to find the execution state, for example, the values on the interpreter stack.

I think the only way to get at what you want is to modify the code algorithmically. You could either transform your source (though you said you didn't want to parse the code), or you could transform the compiled bytecode. Neither is a simple undertaking, and I'm sure there are a dozen difficult hurdles to overcome if you try it.

Sorry to be discouraging...

BTW: What application do you have for this sort of technology?

Ned Batchelder
+1  A: 

Could you recode your original code:

if A or B and ((C or D and E) or F) or G and H:

as, say:

e = Evaluator()
if e('A or B and ((C or D and E) or F) or G and H'):

...? If so, there's hope!-). The Evaluator class, upon __call__, would compile its string argument, then eval the result with (an empty real dict for globals, and) a pseudo-dict for locals that actually delegates the value lookups to the locals and globals of its caller (just takes a little black magic, but, not too bad;-) and also takes note of what names it's looked up. Given Python's and and or's short-circuiting behavior, you can infer from the actual set of names that were actually looked up, which one determined the truth value of the expression (or each subexpression) -- in an X or Y or Z, the first true value (if any) will be the last one looked up, and in a X and Y and Z, the first false one will.

Would this help? If yes, and if you need help with the coding, I'll be happy to expand on this, but first I'd like some confirmation that getting the code for Evaluator would indeed be solving whatever problem it is that you're trying to address!-)

Edit: so here's coding implementing Evaluator and exemplifying its use:

import inspect
import random

class TracingDict(object):

  def __init__(self, loc, glob):
    self.loc = loc
    self.glob = glob
    self.vars = []

  def __getitem__(self, name):
    try: v = self.loc[name]
    except KeyError: v = self.glob[name]
    self.vars.append((name, v))
    return v


class Evaluator(object):

  def __init__(self):
    f = inspect.currentframe()
    f = inspect.getouterframes(f)[1][0]
    self.d = TracingDict(f.f_locals, f.f_globals)

  def __call__(self, expr):
    return eval(expr, {}, self.d)


def f(A, B, C, D, E):
  e = Evaluator()
  res = e('A or B and ((C or D and E) or F) or G and H')
  print 'R=%r from %s' % (res, e.d.vars)

for x in range(20):
  A, B, C, D, E, F, G, H = [random.randrange(2) for x in range(8)]
  f(A, B, C, D, E)

and here's output from a sample run:

R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 0), ('B', 1), ('C', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 0), ('B', 0), ('G', 1), ('H', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 0), ('B', 1), ('C', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 0), ('B', 1), ('C', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=0 from [('A', 0), ('B', 0), ('G', 0)]
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=1 from [('A', 1)]
R=0 from [('A', 0), ('B', 0), ('G', 0)]
R=1 from [('A', 0), ('B', 1), ('C', 1)]

You can see that often (about 50% of the time) A is true, which short-circuits everything. When A is false, B evaluates -- when B is also false, then G is next, when B is true, then C.

Alex Martelli
Hi Alex, I edited the question to provide further information. That Evaluator class does really sound interesting and might do the job. At the moment, I am not sure about the subexpressions though. If you don't mind, I'd like to give your Evaluator class a try!
Lucas
+1, Simple but great solution!
Nadia Alramli
A: 

I would just put something like this before the big statement (assuming the statement is in a class):

for i in ("A","B","C","D","E","F","G","H"):
    print i,self.__dict__[i]
mizipzor