ansaurus

Question

Answer 1

+2 A:

This is a typical case of using Lisp. You would need a function that maps another function over the tree.

Here is a procedural matching example using Common Lisp. There are matchers in Lisp that work over list structures, which could be used instead. Using a list matcher would simplify the example.

The code:

(defun node-children (node)
  (rest node))

(defun node-name (node)
  (second node))

(defun node-type (node)
  (first node))


(defun treemap (tree matcher transformer)
  (cond ((null tree) nil)
        ((consp tree)
         (if (funcall matcher tree)
             (funcall transformer tree)
           (cons (node-type tree)
                 (mapcar (lambda (child)
                           (treemap child matcher transformer))
                         (node-children tree)))))
        (t tree))))

The example:

(defvar *tree*
  '(ROOT
    (S
     (NP
      (NP (NNP Bank))
      (PP (IN of)
          (NP (NNP America))))
     (VP (VBD used)
         (S
          (VP (TO to)
              (VP (VB be)
                  (VP (VBN called)
                      (NP
                       (NP (NNP Bank))
                       (PP (IN of)
                           (NP (NNP Italy))))))))))))



(defun example ()
  (pprint
   (treemap *tree*
            (lambda (node)
              (and (= (length (node-children node)) 2)
                   (eq (node-type (first (node-children node))) 'np)
                   (some (lambda (node)
                           (eq (node-name node) 'bank))
                         (children (first (node-children node))))
                   (eq (first (second (node-children node))) 'pp)))
            (lambda (node)
              (list (node-type node)
                    (append (first (node-children node))
                            (node-children (second (node-children node)))))))))

Running the example:

CL-USER 75 > (example)

(ROOT
 (S
  (NP
   (NP (NNP BANK) (IN OF) (NP (NNP AMERICA))))
  (VP
   (VBD USED)
   (S
    (VP
     (TO TO)
     (VP
      (VB BE)
      (VP
       (VBN CALLED)
       (NP
        (NP
         (NNP BANK)
         (IN OF)
         (NP (NNP ITALY)))))))))))

Rainer Joswig 2010-09-12 07:19:14

Answer 2

+2 A:

Beauty is in the eye of the beholder. But you never say how the code of Tregex or Tsurgeon is a mess. It sounds more like you can't deal with Java or greater abstraction and so you're looking for something concrete written in Python.

There's nothing wrong with hand-writing tree matching and transformation functions. Indeed, we used to do that all the time. But after the first couple of hundred, it seemed like there had to be a better way, and hence we moved to using the domain-specific languages of Tregex and Tsurgeon. This is generally seen as a laudable programming style. See in Wikipedia. They're well-specified languages with an exact syntax specification, etc. Here is your example using them.

Tree t = Tree.valueOf("(ROOT (S (NP (NP (NNP Bank)) (PP (IN of) (NP (NNP America)))) (VP (VBD used) (S (VP (TO to) (VP (VB be) (VP (VBN called) (NP (NP (NNP Bank)) (PP (IN of) (NP (NNP Italy)))))))))))");
TregexPattern pat = TregexPattern.compile("NP <1 (NP << Bank) <2 PP=remove");
TsurgeonPattern surgery = Tsurgeon.parseOperation("excise remove remove");
Tsurgeon.processPattern(pat, surgery, t).pennPrint();

Note that the Java code is actually shorter than the Lisp code, precisely because of use of the domain-specific language. It's hard to see how this could be simpler: specify pattern, specify operation, apply.

But if you'd prefer to be hand-writing methods that match patterns on trees and change them into other trees in Python, then you're most welcome to go off and do that.

Christopher Manning 2010-09-18 21:25:22

Is there any documentation for using SP tree regex? Or are the javadocs the only documentation so far?

gnucom 2010-09-18 22:07:20

Ah, hello Professor Manning, sorry to be criticizing your work without providing concrete reasons. I want to hack on the code but I find 100,000 lines a little daunting for just adding a little layer of abstraction. I am very much grateful for the existence of Tregex/Turgeon. I'm just curious if a DSL doing something similar can be written much more succinctly. There's still the problem of Java<->Python interactions which I've solved in a unsatisfying way, but it works (somewhat).

guidoism 2010-09-19 16:39:49

@gnucom: I'll admit that the documentation could be better/more extensive. But there are a couple of other resources. The ppt slides http://nlp.stanford.edu/software/tregex/The_Wonderful_World_of_Tregex.ppt are the best place to go for an intro to tregex patterns. There are useful help screens in the GUI application. For more, see: http://nlp.stanford.edu/software/tregex-faq.shtml#b .

Christopher Manning 2010-09-19 22:35:06

@guidoism: BTW, it wasn't my work. The DSLs and most of the code were written by Roger Levy and Galen Andrew. I'm sure a DSL could be written more succinctly in other languages - such as Scala or Clojure - which are more suited to that. Java is anything but succinct and is not particularly oriented towards writing DSLs. But I think your primary point is that if you just want to add on a few special cases and you're not that familiar with the tools and code, it's actually easier to do it with tree functions than with a DSL, where you have to change the tokenizer and parser, etc.

Christopher Manning 2010-09-19 22:42:53

Answer 3

+1 A:

Here is a second version in Common Lisp. This time I'm using a pattern matcher.

I'm using a function that matches a pattern against Lisp data. PMATCH:MATCH is an enhanced version of a pattern matcher found in the book Winston/Horn, Lisp, 3rd Edition. There are similar pattern matching functions available.

The data is as in my other answer.

The tree mapping function is changed to use the pattern matcher. The PMATCH:MATCH function returns T or an assoc list of bindings if the match is successful. It returns NIL if the match is not successful. The PMATCH:INSTANTIATE-PATTERN takes a pattern and a set of bindings. It returns a new list structure, where the pattern variables are replaced with the bindings.

(defun treemapp (tree pattern transformer)
  (cond ((null tree) nil)
        ((consp tree)
         (let ((bindings (pmatch:match pattern tree)))
           (if bindings
               (pmatch:instantiate-pattern transformer bindings)
             (cons (node-type tree)
                   (mapcar (lambda (child)
                             (treemapp child pattern transformer))
                           (node-children tree))))))
        (t tree)))

The example uses now patterns.

The pattern is a list structure. #?symbol matches a single item and creates a binding for symbol. #$symbol matches a list of items and creates a binding for symbol.

The transformer is a pattern that will be instantiated with the bindings.

(defun example1 ()
  (pprint (treemapp *tree*
                    '(NP (NP (#?type bank)) (PP #$children))
                    '(NP (NP (#?type bank) #$children)))))

Running this code returns the same result as in my other answer.

Rainer Joswig 2010-09-18 22:15:29

ansaurus

tags:

views:

answers:

How do I manipulate parse trees?

related questions