I'm experimenting with Latent Dirichlet Allocation for topic disambiguation and assignment, and I'm looking for advice.
- Which program is the "best", where best is some combination of easiest to use, best prior estimation, fast
- How do I incorporate my intuitions about topicality. Let's say I think I know that some items in the corpus are really in the same category, like all articles by the same author. Can I add that into the analysis?
- Any unexpected pitfalls or tips I should know before embarking?
I'd prefer is there are R or Python front ends for whatever program, but I expect (and accept) that I'll be dealing with C.