views:

38

answers:

1

Hey guys,

other than the standard arch options like left3words, left5words,bidirectional, bi5words, what do the rest of the options mean? And what arguments are needed for them?

I can't seem to find the documentation anywhere!

+2  A: 

I'm afraid that the arch options are at present only documented in the source code :-(.

See the ExtractorFrames and ExtractorFramesRare classes.

A first thing to do would be to look at the arch options that are used in the distributed taggers. You can find them in the *.props files in the models subdirectory.

In brief:

  • "generic" gives you a decent basic set of word and tag features (current, previous, and next word features, previous tag and previous two tags, and conjunctions of previous tag and current word and current and previous word). It's a good place to start.
  • There are various options that turn on a whole bunch of extractors to give known good configurations for English and Chinese (bidirectional, sighan2005, naacl2003unknowns).
  • Other options, often with a parameter, turn on sets of features in sensible ways that can be mixed together. You can see this in the definitions of the distributed Chinese and Arabic taggers. E.g., suffix(6) includes as features all word-ending substrings of length up to 6.
Christopher Manning
@Christopher, thanks.
goh