What are some good algorithms for automatically labeling text with the city / region or origin? That is, if a blog is about New York, how can I tell programatically. Are there packages / papers that claim to do this with any degree of certainty?
I have looked at some tfidf based approaches, proper noun intersections, but so far, no spectacular successes, and I'd appreciate ideas!
The more general question is about assigning texts to topics, given some list of topics.
Simple / naive approaches preferred to full on Bayesian approaches, but I'm open.