Methods for Geotagging or Geolabelling Text Content

views:

661

answers:

+5 Q:

Methods for Geotagging or Geolabelling Text Content

What are some good algorithms for automatically labeling text with the city / region or origin? That is, if a blog is about New York, how can I tell programatically. Are there packages / papers that claim to do this with any degree of certainty?

I have looked at some tfidf based approaches, proper noun intersections, but so far, no spectacular successes, and I'd appreciate ideas!

The more general question is about assigning texts to topics, given some list of topics.

Simple / naive approaches preferred to full on Bayesian approaches, but I'm open.

I'm not sure if I understand the problem. So you want to take a piece of text and devise an algorithm to guess where in the world it might come from? I've never heard of a package that does this, but I'd be interested to hear about it if you figure it out :)

Ryan 2008-10-02 18:53:39

I have revised the question, but you have it, I think.

Gregg Lind 2008-10-02 18:56:35

+4 A:

You're looking for a named entity recognition system, or short NER. There are several good toolkits available to help you out. LingPipe in particular has a very decent tutorial. CAGEclass seems to be oriented around NER on geographical place names, but I haven't used it yet.

Here's a nice blog entry about the difficulties of NER with geographical places names.

If you're going with Java, I'd recommend using the LingPipe NER classes. OpenNLP also has some, but the former has a better documentation.

If you're looking for some theoretical background, Chavez et al. (2005) have constructed an interesting syntem and documented it.

Aleksandar Dimitrov 2008-10-02 21:38:52

Thanks for the advice. This is a hard, hard problem, and your answer, which I summarize as "Look up NER" is about the best there is, probably :)

Gregg Lind 2008-12-26 22:20:16

+2 A:

Latent Semantic Mapping seems like potentially a good fit. That's just about as naive of an algorithm as you're likely to find.

Mark Bessey 2008-10-02 21:46:48

ansaurus

tags:

views:

answers:

Methods for Geotagging or Geolabelling Text Content

related questions