ansaurus

Question

Python: Optimizing, or at least getting fresh ideas for a tree generator.

Answer 1

+2 A:

You can omit lots of braces in your code, that's one of Python's benefits. E.g. when putting braces around conditions, like

if (depth <= 0) or ((r > Ratio) and (not (entry))):

just write

if depth <= 0 or (r > Ratio and not entry):

And I think there are a couple of redundant calls, e.g.

this_atom = str(this_atom)

(this_atom will already be a string, and building strings is always expensive, so just omit this line)

or the call to the object constructor

object.__init__(self)

which isn't necessary, either.

As for the Node.__init__ method being the "bottleneck": I guess spending most of your time there cannot be avoided, since when constructing trees like this there's not much else you'll be doing but creating new Nodes.

jellybean 2010-01-24 16:51:19

@jellybean. Noted. I'll correct those. Thanks

Peter Stewart 2010-01-24 17:10:01

Answer 2

+2 A:

You can replace the KeySeq generator with itertools.count which does exactly the same thing but is implemented in C.

I don't see any way of speeding up the Node constructor. The call to random.choice you could optimise by inlining the code - cut & paste it from the source for the random module. This will eliminate a function call, which are relatively expensive in Python.

You could speed it up by running under psyco, which is a kind of JIT optimiser. However this only works for 32 bit Intel builds of Python. Alternatively you could use cython - this converts python(ish) code into C, which can be compiled into a Python C module. I say pythonish since there some things that cannot be converted, and you can add C data type annotations to make the generated code more efficient.

Dave Kirby 2010-01-24 17:24:34

Using KS = itertools.count() improved it by 6%. Putting random.choice() inline improved it 4%. I'll read up on cython. Thank you!

Peter Stewart 2010-01-24 21:34:45

Answer 3

+4 A:

Below, I've summarized some of the more obvious optimization efforts, without really touching the algorithm much. All timings are done with Python 2.6.4 on a Linux x86-64 system.

Initial time: 8.3s

Low-Hanging Fruits

jellybean already pointed some out. Just fixing those already improves the runtime a little bit. Replacing the repeated calls to Operators.keys() by using the same list again and again also saves some time.

Time: 6.6s

Using itertools.count

Pointed out by Dave Kirby, simply using itertools.count also saves you some time:

from itertools import count
KS = count()

Time: 6.2s

Improving the Constructor

Since you're not setting all attributes of Node in the ctor, you can just move the attribute declarations into the class body:

class Node(object):
    isRoot = False
    left  = None
    right = None
    parent = None
    branch = None
    seq = 0

    def __init__(self, cargo):
        self.cargo = cargo

This does not change the semantics of the class as far as you're concerned, since all values used in the class body are immutable (False, None, 0), if you need other values, read this answer on class attributes first.

Time: 5.2s

Using namedtuple

In your code, you're not changing the expression tree any more, so you might as well use an object that is immutable. Node also does not have any behavior, so using a namedtuple is a good option. This does have an implication though, since the parent member had to be dropped for now. Judging from the fact that you might introduce operators with more than two arguments, you would have to replace left/right with a list of children anyway, which is mutable again and would allow creating the parent node before all the children.

from collections import namedtuple
Node = namedtuple("Node", ["cargo", "left", "right", "branch", "seq", "isRoot"])
# ...
    def build_nodes (self,  depth = Depth, entry = 1,  pparent = None,
         bbranch = None):
        r = random.random()

        if (depth <= 0) or ((r > Ratio) and (not (entry))):
            this_node = Node(
                random.choice(Atoms), None, None, bbranch, KS.next(), False)
            self.thedict[this_node.seq] = this_node
            return this_node

        else:
            this_operator = random.choice(OpKeys)

            this_node = Node(
              this_operator,
              self.build_nodes(entry = 0, depth = depth - 1,
                               pparent = None, bbranch = 'left'),
              self.build_nodes(entry = 0, depth = depth - 2,
                               pparent = None, bbranch = 'right'),
              bbranch, 
              KS.next(), 
              bool(entry))

            self.thedict[this_node.seq] = this_node    
            return this_node

I've kept the original behavior of the operand loop, that decrements the depth at each iteration. I'm not sure this is wanted behavior, but changing it increases runtime and therefore makes comparison impossible.

Final time: 4.1s

Where to go from here

If you want to have support for more than two operators and/or support for the parent attribute, use something along the lines of the following code:

from collections import namedtuple
Node = namedtuple("Node", ["cargo", "args", "parent", "branch", "seq", "isRoot"])

    def build_nodes (self,  depth = Depth, entry = 1,  pparent = None,
         bbranch = None):
        r = random.random()

        if (depth <= 0) or ((r > Ratio) and (not (entry))):
            this_node = Node(
                random.choice(Atoms), None, pparent, bbranch, KS.next(), False)
            self.thedict[this_node.seq] = this_node
            return this_node

        else:
            this_operator = random.choice(OpKeys)

            this_node = Node(
              this_operator, [], pparent, bbranch,
              KS.next(), bool(entry))
            this_node.args.extend(
              self.build_nodes(entry = 0, depth = depth - (i + 1),
                               pparent = this_node, bbranch = i)
              for i in range(Operators[this_operator]))

            self.thedict[this_node.seq] = this_node    
            return this_node

This code also decreases the depth with the operator position.

Torsten Marek 2010-01-24 21:19:20

What a terrific lesson for me! I'm still learning it all so it's taken me a while to respond. While selecting the individuals (Trees) for fitness, a subset are chosen and crossbred. Two are randomly selected and randomly selected sub branches are exchanged. This involves changing the parent and branch attributes. I'm still reading about namedtuples as well as the link you included about class attributes.

Peter Stewart 2010-01-25 14:08:14

I'll implement your second suggestion. I'm not sure how to walk the tree when it uses 'args[]' instead of left and right.There is a class 'Chromosome' of which Tree is an attribute. I'll implement it as a namedtuple as well. Thanks

Peter Stewart 2010-01-25 20:20:56

I'm assuming I will be able to use 'somenamedtuple._replace(kwargs)' to swich subtrees when I get to the crossbreeding routine.

Peter Stewart 2010-01-25 20:27:13

In regards to walking the tree. when building the terminal Node, berore the else clause, I replaced the 'None' value assigned to 'args' with [None, None].

Peter Stewart 2010-01-26 13:31:28

ansaurus

tags:

views:

answers: