tags:

views:

43

answers:

3

Hi

I've been working on a NLP project, trying to define an intermediate POS tagging system and the wrappers for known POS tagging systems for migrating to mine. My question is :

What is the best POS Tagging system you´ve seen.

Do not talk about a system because you like it, but because of being extensible and descriptive.

For those who don´t know what a POS tagging system is, POS stands for Parts of Speech and the tagging systems are focused on taking a corpus(bunch of text) and putting labels to words (noun, verb, etc.)

Hope people find this interesting as I find it

+1  A: 

It is unclear from your question what exactly you mean by "POS tagging system". There are a couple issues that seem to be mixed together:

  • which POS tagset is good for a particular language/purpose

  • how difficult it is to convert between different tagsets

  • how well a particular tagging method works with a particular tagset (or how well humans can annotate using that particular tagset)

An "intermediate" tagset would need to make all the distinctions made in each individual tagset in order to make converting between tagsets easy, but a large number of tags could make your tagger performance worse. However, a well-designed large tagset could also potentially work better than a poorly-designed small tagset for human annotators or for taggers.

You should look for research in tagset design and tagset conversion and you might also want to look at work in supertagging. If you are working on English, you might look at CLAWS 5 vs. CLAWS 7 and compare them to the Penn Treebank and Brown tagsets (and search for previous work that does this!). This thesis might be a good starting point.

aab
I'll take a deep look into it. Thanks a lot!
David Conde
A: 

You should definitely check out the C&C tools developed by James Curran and Stephen Clark. It is one of the fastest parsers (if not the fastest) you can find, and it is even open sourced!

William