views:

39

answers:

1

I can't seem to find that in the documentation anywhere

+1  A: 

The Penn Treebank has 4.5 million English words that are used for P.O.S tagging, and about half of that is used for skeletal parsing.

Check out page 327 of this document http://acl.ldc.upenn.edu/J/J93/J93-2004.pdf. It is a little outdated (2004) but I can't think of any new words that English speakers have introduced since then.

gnucom
Thank you, that was really helpful!!
Lezan