tags:

views:

139

answers:

2

I am looking for a way given an English text count verb phrases in it in past, present and future tenses. For now I am using NLTK, do a POS (Part-Of-Speech) tagging, and then count say 'VBD' to get past tenses. This is not accurate enough though, so I guess I need to go further and use chunking, then analyze VP-chunks for specific tense patterns. Is there anything existing that does that? Any further reading that might be helpful? The NLTK book is focused mostly on NP-chunks, and I can find quite few info on VP-chunks.

+1  A: 

You can do this with either the Berkley Parser or Stanford Parser. But I don't know if there's a Python interface available for either.

ars
Thanks a lot, this might be an option - however as I am heavily using NLTK already, it might be quite a lot of work to switch. Will look though.
Michael Pliskin
+3  A: 

Thee exact answer depends on which chunker you intend to use, but list comprehensions will take you a long way. This gets you the number of verb phrases using a non-existent chunker.

len([phrase for phrase nltk.Chunker(sentence) if phrase[1] == 'VP'])

You can a more fine-grained approach to detect numbers of tenses.

Tim McNamara
Thanks for the pointer, that's what I am gonna use - my next question is whether there is something existing to detect tense patterns. For each VP I'd like to know what tense is it in.
Michael Pliskin
I actually managed to solve my problem with this approach, so tagging this as accepted answer. The following article is really helpful: http://streamhacker.com/2009/02/23/chunk-extraction-with-nltk/
Michael Pliskin
Hi Michael, great to hear that things are working well for you!
Tim McNamara