tags:

views:

567

answers:

5

The webapps that depend on the public timeline of twitter, how often do they collect the data? There must be hundreds of thousands of messages every minute, correct? How do they manage to collect all the tweets, without missing any of them?

A: 

The Twitter API is rate limited. So, you are constrained by that, if nothing else: http://apiwiki.twitter.com/Rate-limiting

Sinan Ünür
If I poll once a minute, I think I am safe, right?
Yes. You are restricted to 100 GET requests per hour, so once a minute is fine.
ceejayoz
+1  A: 

Some services (Friendfeed is a good example) are granted access to the Twitter Streaming API, aka the 'firehose'. It requires approval and a written agreement.

ceejayoz
+1  A: 

The twitter API is rate limited, as has been said. The public timeline (twitter.com/public_timeline) is not rate limited in the same sense, but it is only updated every 5 seconds, so most tweets never appear there.

There are I think three or four companies that have access to the firehose, as Twitter's full feed is called. FriendFeed is one of these. Another is Gnip. Gnip resells the feed to other companies. This is probably the only feasible way to get a full twitter feed.

Kevin Peterson
does this mean that sites like twizon.com, don't get all the tweets?
Twizon is likely using the Twitter Search API to search for 'Amazon' and other relevant keywords. They're not pulling down the public timeline.
ceejayoz
I'm not sure if that is enough. What if I shorten the URL, and talk about the product, and never mention Amazon in my tweet? In fact, I checked some tweets, there is no mention of the word 'Amazon' and mostly short URLs are used. The only way is to read the tweet, check for short URL, convert it to long URL and then save the tweet if it is about a product from amazon (based on URL).Is there anything I'm missing?
+1  A: 

Go here:

http://twitter.com/help/request_whitelisting

and get your account white-listed (allows 20,000 per hour) if 100 requests per hour isn't enough.

@ceejayoz its not 100 GET requests its 100 requests in general excluding a few requests like verify_credentials and rate_limit_status.

Chad Scira
+1  A: 

The publictimeline is not a great place to mine data anymore. Twitter now uses its Streaming APIs to output tweets like crazy. The closest comparison to the publictimeline would be the spritzer method, but that only includes a small sample. If you need to gather all (or more) tweets than the spritzer method, you'll need to sign a written agreement to get access to other Streaming API (HTTP push) feeds, such as the firehose feed, which returns all public tweets.

Chris Thomson