There's a decent bit of information out there for writing classifiers. I wrote a blog article about it awhile back.
As for samples/code, there's:
The answer to the other part of your question about if the SDK tells you what each word is, the answer is "no", with a few "kinda" caveats. In general, the underlying language models are not exposed, though you can do things like consume the classification information from other classifiers in the hope that they give you enough information; some, like C#, tend to give a good deal of information that may not show up in the IDE in the default fonts and colors settings (check the Tools->Options->Environment->Fonts and Colors settings to see if you want to change may already be there), and others, like VB, tend not to. You can also use things like DTE's CodeModel, but I've never heard of someone having really good experiences with it.
If you want an example of consuming classification information, you can see how this CommentTextTagger.cs (part of a spellchecker extension) does it.