A CRF is a discriminative, batch, tagging model, in the same general family as a Maximum Entropy Markov model.
A full explanation is book-length.
A short explanation is as follows:
- Humans annotate 200-500K words of text, marking the entities.
- Humans select a set of features that they hope indicate entities. Things like capitalization, or whether the word was seen in the training set with a tag.
- A training procedure counts all the occurrences of the features.
- The meat of the CRF algorithm search the space of all possible models that fit the counts to find a pretty good one.
- At runtime, a decoder (probably a Viterbi decoder) looks at a sentence and decides what tag to assign to each word.
The hard parts of this are feature selection and the search algorithm in step 4.