tags:

views:

45

answers:

0

I have implemented the following data structure:

class Node(object):
    """Rules:
    A node's child is ONLY an iterable of nodes
    A leaf node must NOT have children and MUST have word
    """
    def __init__(self, tag, children=[], word=u""):
        assert isinstance(tag, unicode) and isinstance(word, unicode)
        self.tag=tag
        self.word=word
        self.parent=None                #Set by recursive function
        self.children=children          #Can only be iterable of nodes now
        for child in self.children:
            child.parent=self

    def matches(self, node):
        """Match RECURSIVELY down!"""
        if self.tag == node.tag:
            if all( map( lambda t:t[0].matches(t[1]), zip( self.children, node.children))):
                if self.word != WILDCARD and node.word != WILDCARD:
                    return self.word == node.word
                else:
                    return True
        return False

    def __unicode__(self):
        childrenU= u", ".join( map( unicode, self.children))
        return u"(%s, %s, %s)" % (self.tag, childrenU, self.word)

    def __str__(self):
        return unicode(self).encode('utf-8')

    def __repr__(self):
        return unicode(self)

So a tree is basically a bunch of these nodes connected together.

I am parsing S-Expression, like this: (VP (VP (VC w1) (NP (CP (IP (NP (NN w2)) (VP (ADVP (AD w3)) (VP (VA w4)))) (DEC w5)) (NP (NN w6)))) (ADVP (AD w7)))

So I am interested in writing matching a subtree with a bigger tree. The catch is, the subtree has wildcard characters, and I would like to also be able to match these characters.

For example: If given a subtree,

    (VP
      (ADVP (AD X))
      (VP (VA Y))))

The operation which "matches" both of them should return { X:W3, Y:W4 }

Anyone here able to recommend an effecient, simple solution?