views:

496

answers:

3

Hello!

A few days ago, I saw that happn.in now offers a service where tweets (messages on twitter.com) are grouped and analyzed for local areas. For several cities, they give you a list of trending terms.

I know that you cannot exactly know how they do this but maybe you can help me, though: How can I do this, too? I have several approaches. Is one of them useful?

  • APPROACH 1

Use the REST API public_timeline and go through all the tweets every time. Make a list of patterns and fitting locations, e.g. "New York" and "NY" go to "New York City", "Los Angeles" and "LA" go to "Los Angeles" etc. If you can't find a known pattern, you continue with the next tweet.

  • APPROACH 2

Use the Search API geocode feature, e.g. "http://search.twitter.com/search.atom?geocode=##LAT##%2C##LONG##%2C##radius##km". Unfortunately, I don't know if the results are trustful/good!?

  • APPROACH 3

Follow users who have a city name in their location field and analyze the REST API friends_timeline

Do you have other ideas?

I hope you can help me. Thanks in advance!

+1  A: 

happn.in is actually really simple:

They have different users for each city (i.e. happn_in_ny) that follow people in that city, and they just use that user's friends timeline to analyze.

jimyi
Thank you, apparently, you're right. I just saw all those accounts. But it's a bit spammy to do it this way. Some of their accounts got suspended, e.g. happn_in_tor for Toronto.
+3  A: 

APPROACH 1 - Repeatedly querying the Public Timeline won't give you all the tweets, there are just too many. You'll get 20 recent ones, and the twitter servers will cache those for a time so even if you just keep hammering it you'll get the same results. They have an XMPP feed that will push updates out to you, but you have to apply for access.

APPROACH 2 & APPROACH 3 - In either of these cases you're relying on the users to provide truthful information. There's nothing preventing a user from leaving it out or lying.

No matter your approach, you also have to watch out for API limits if you're going to be querying repeatedly. Consider applying for a whitelisted account that will give you 20,000 reqs/hour instead of the 100 that everyone gets by default.

That said, 2 & 3 will give you better results than 1. Getting access to the "firehose" xmpp feed and using the location or geocode would probably give you the best possible results. You'll probably never get 100% reliable perfect information, even with that, but that's probably the best you can do.

You can also look at gnip.com. They have access to the twitter firehose and I believe they can filter and repackage it for you somehow. I confess I don't know very much about their service, but it's on my todo list to find out. You may have to pay for this.

Jason Diller
+1  A: 

You could do a combination of the first two:

http://search.twitter.com/search?q=near%3ANYC+within%3A15mi (as their example says) &geocode=whatever_NY_geocode_may_be

SeanJA
Thanks, the "near" parameter isn't available in the API so I can't use it, only the "geocode".