views:

59

answers:

1

According to the NLTK book, I first apply the grammar, and parse it.

grammar = r"""
            NP: {<DT|PP\$>?<JJ>*<NN>}
                {<NNP>+}
                """
cp = nltk.RegexpParser(grammar)
chunked_sent =  cp.parse(sentence)

When I print chunked_sent, I get this:

(S
  i/PRP
  use/VBP
  to/TO
  work/VB
  with/IN
  you/PRP
  at/IN
  (NP match/NN)
  ./.)

I don't want to just look at it. I want to actually pull out the "NP" noun phrases.

How can I print out "match"...which is the noun phrase? I want to get all "NP" out of that chunked_sent.

for k in chunked_sents:
    print k

(u'i', 'PRP')
(u'use', 'VBP')
(u'to', 'TO')
(u'work', 'VB')
(u'with', 'IN')
(u'you', 'PRP')
(u'at', 'IN')
(NP match/NN)
(u'.', '.')


for k in chunked_sents:
    print k[0]

i
use
to
work
with
you
at
(u'match', 'NN')

See, for some reason, I lose the "NP".
Also, how do I determine if k[0] is a string or tuple (as in the case above)

A: 

Well you might have already found the answer. I am posting it for the people who might face this scenario in the future.

for subtree in chunked_sent.subtrees():
    if subtree.node == 'NP': print subtree
Neodawn