views:

127

answers:

2

I know tree is a well studied structure.

I'm writing a program that randomly generates many expression trees and then sorts and selects by a fitness atribute.

I have a class MakeTreeInOrder() that turns the tree into a string that 'eval' can evaluate.

but it gets called many many times, and should be optimized for time.

below is a funtion that builds a tree that adds successive numbers to use as a test.

I was wondering if there is an optimized way to evaluate an exprssion that is in a tree structure. I figured that

that it's used quite a bit and somebodys already done this.

import itertools
from collections import namedtuple 

#Further developing Torsten Marek's second suggestion

KS = itertools.count()
Node = namedtuple("Node", ["cargo", "args"]) 

def build_nodes (depth = 5):     
    if (depth <= 0): 
        this_node = Node((str(KS.next())), [None, None])
        return this_node 
    else:
        this_node = Node('+', []) 
        this_node.args.extend( 
          build_nodes(depth = depth - (i + 1))                             
          for i in range(2)) 

        return this_node

The following is the code that I think can be made a lot faster. And I was hoping for some ideas.

class MakeTreeInOrder(object):
    def __init__(self, node):
        object.__init__(self)
        self.node = node
        self.str = ''
    def makeit(self, nnode = ''):
        if nnode == '':
            nnode = self.node
        if nnode == None: return
        self.str +='('
        self.makeit(nnode.args[0])
        self.str += nnode.cargo
        self.makeit(nnode.args[1])
        self.str+=')'
        return self.str

def Main():
    this_tree = build_nodes()
    expression_generator = MakeTreeInOrder(this_tree)
    this_expression = expression_generator.makeit()
    print this_expression
    print eval(this_expression)

if __name__ == '__main__':
    rresult = Main()
A: 

Your example imports numpy and random but never uses them. It also has a "for i in range(2))" with no body. This is clearly not valid Python code.

You don't define what 'cargo' and the nodes are supposed to contain. It appears that 'cargo' is a number, since it comes from itertools.count().next(). But that makes no sense since you want the result to be a eval'able Python string.

If you are doing a one-time evaluation of the tree then the fastest solution would be to evaluate it directly in-place, but without an actual example of the data you're working with, I can't show an example.

If you want to make it even faster then look further upstream. Why do you generate the tree and then evaluate it? Can't you evaluate the components directly in the code which currently generates the tree structure? If you have operators like "+" and "*" then consider using operator.add and operator.mul instead, which can work directly on the data without using an intermediate step.

==update==

This builds on Paul Hankin's answer. What I've done is taken away the intermediate tree structure and just evaluate the expression directly.

def build_tree2(n):
    if n == 0:
        return random.randrange(100)
    else:
        left = build_tree2(n-1)
        right = build_tree2(n-1)
        return left+right

That clocks at about 5 times faster than Paul's solution.

It may be that you need to know the actual tree structure of the best solution, or the top k of N, where k << N. If that's the case then you can post-hoc regenerate those trees if you also keep track of the RNG state used to generate the results. For example:

def build_tree3(n, rng=random._inst):
    state = rng.getstate()
    return _build_tree3(n, rng.randrange), state

def _build_tree3(n, randrange):
    if n == 0:
        return randrange(100)
    else:
        left = _build_tree3(n-1, randrange)
        right = _build_tree3(n-1, randrange)
        return left+right

Once you've found the best values, use the key to regenerate the tree

# Build Paul's tree data structure given a specific RNG
def build_tree4(n, rng):
    if n == 0:
        return ConstantNode(rng.randrange(100))
    else:
        left = build_tree4(n-1, rng)
        right = build_tree4(n-1, rng)
        return ArithmeticOperatorNode("+", left, right)

# This is a O(n log(n)) way to get the best k.
# An O(k log(k)) time solution is possible.
rng = random.Random()
best_5 = sorted(build_tree3(8, rng) for i in range(10000))[:5]
for value, state in best_5:
    rng.setstate(state)
    tree = build_tree4(8, rng)
    print tree.eval(), "should be", value
    print "  ", str(tree)[:50] + " ..."

Here's what it looks like when I run it

10793 should be 10793
   ((((((((92 + 75) + (35 + 69)) + ((39 + 79) + (6 +  ...
10814 should be 10814
   ((((((((50 + 63) + (6 + 21)) + ((75 + 98) + (76 +  ...
10892 should be 10892
   ((((((((51 + 25) + (5 + 32)) + ((40 + 71) + (17 +  ...
11070 should be 11070
   ((((((((7 + 83) + (77 + 56)) + ((16 + 29) + (2 + 1 ...
11125 should be 11125
   ((((((((69 + 80) + (11 + 64)) + ((33 + 21) + (95 + ...
Andrew Dalke
Thanks, I edited out the numpy and random. the 'i in range(2)' runs the preceding code 'build_nodes(depth = depth - (i + 1))' twice to build the two subnodes. Evaluating it in-place sounds like what I'd like to do. I've been trying and I can't get it to work. I figured it was a technique that was part of the existing information on tree data structures. The program generates an evaluatable string. The expression is stored in the tree because it may be changed later in the program. I like the tree structure but if there is a better structure, I'm open to it.
Peter Stewart
D'oh! Generator comprehension got me there. In my code I always indent them from the previous line so I don't get confused from between the two forms.
Andrew Dalke
That's a great idea. The program typically evaluates 200 individuals and only remembers the best 8 for crossbreeding and mutation and to carry on to the next round of generation. Thanks
Peter Stewart
+3  A: 

Adding a touch of object orientation here makes things simpler. Have subclasses of Node for each thing in your tree, and use an 'eval' method to evaluate them.

import random

class ArithmeticOperatorNode(object):
    def __init__(self, operator, *args):
        self.operator = operator
        self.children = args
    def eval(self):
        if self.operator == '+':
            return sum(x.eval() for x in self.children)
        assert False, 'Unknown arithmetic operator ' + self.operator
    def __str__(self):
        return '(%s)' % (' ' + self.operator + ' ').join(str(x) for x in self.children)

class ConstantNode(object):
    def __init__(self, constant):
        self.constant = constant
    def eval(self):
        return self.constant
    def __str__(self):
        return str(self.constant)

def build_tree(n):
    if n == 0:
        return ConstantNode(random.randrange(100))
    else:
        left = build_tree(n - 1)
        right = build_tree(n - 1)
        return ArithmeticOperatorNode('+', left, right)

node = build_tree(5)
print node
print node.eval()

To evaluate the tree, just call .eval() on the top level node.

node = build_tree(5)
print node.eval()

I also added a __str__ method to convert the tree to a string so you can see how this generalizes to other tree functions. It just does '+' at the moment, but hopefully it's clear how to extend this to the full range of arithmetic operations.

Paul Hankin
When run for 10,000 iterations: this code was a little faster when the depth of the tree was set to 3. .63 seconds vs .79 seconds. However, when the depth was set to 8, it was much slower. 22.56 seconds vs 9.74 seconds. Using operators that evaluate themselves seems like the way to go if I want to evaluate in place. Thanks very much.
Peter Stewart
Thanks for clarifying the intent of the code. I based my followup on what you've done.
Andrew Dalke
What exactly did you compare? Did you include the time building the trees? Did you use random.randrange(100) in your code too? Did you include the time building the string when timing your code? Did you use the timeit module?
Paul Hankin
Well, the details are in my response. My point was show that not building the intermediate tree structure might give overall faster performance than building the tree then eval-ing it. Or, if the tree is sometimes important, then it's possible to do a fast eval then a slower constitution of the tree once that's needed. As for timings, time.time() of the entire process. My timing tests weren't in the sub-second times where timeit becomes useful.
Andrew Dalke
@Paul. I was using cprofile for timing, mistakenly. I reran it using time.time and did not include the time building the trees, and your code was fastest 4.02 vs 5.94 using a depth of 8. I learned a lot from your code. I would not have thought to do it that way on my own. Thanks
Peter Stewart
If I try to generalize the operators using:import operators as opdef eval(self):.... elif self.cargo == '-': return op.sub(x.eval() for x in self.children) then (x.eval() for x in self.children) doesn't resolve into two arguements. Any thoughts?
Peter Stewart
Use: reduce(operator.div,[x.eval()for x in self.children])
Peter Stewart