I've read the papers linked to in this question. I half get it. Can someone help me figure out how to implement it?
I assume that features are generated by some heuristic. Using a POS-tagger as an example; Maybe looking at training data shows that 'bird'
is tagged with NOUN
in all cases, so feature f1(z_(n-1),z_n,X,n)
is generated as
(if x_n = 'bird' and z_n = NOUN then 1 else 0)
Where X
is the input vector and Z
is the output vector. During training for weights, we find that this f1
is never violated, so corresponding weight \1
(\
for lambda) would end up positive and relatively large after training. Both guessing features and training seem challenging implement, but otherwise straightforward.
I'm lost on how one applies the model to untagged data. Initialize the output vector with some arbitrary labels, and then change labels where they increase the sum over all the \ * f
?
Any help on this would be greatly appreciated.