ansaurus

Question

Twitter competition ~ saving tweets (PHP & MySQL)

Answer 1

+1 A:

The Twitter API offers a streaming API that is probably what you want to do to ensure you capture everything: http://dev.twitter.com/pages/streaming_api_methods

If I understand what you're looking for, you'll probably want a statuses/filter, using the track parameter with whatever distinguishing characteristics (hashtags, words, phrases, locations, users) you're looking for.

Many Twitter API libraries have this built in, but basically you keep an HTTP connection open and Twitter continuously sends you tweets as they happen. See the streaming API overview for details on this. If your library doesn't do it for you, you'll have to check for dropped connections and reconnect, check the error codes, etc - it's all in the overview. But adding them as they come in will allow you to completely eliminate duplicates in the first place (unless you only allow one entry per user - but that's client-side restrictions you'll deal with later).

As far as not hammering your DB, once you have Twitter just sending you stuff, you're in control on your end - you could easily have your client cache up the tweets as they come in, and then write them to the db at given time or count intervals - write whatever it has gathered every 5 minutes, or write once it has 100 tweets, or both (obviously these numbers are just placeholders). This is when you could check for existing usernames if you need to - writing a cached-up list would allow you the best chance to make things efficient however you want to.

Update: My solution above is probably the best way to do it if you want to get live results (which it seems like you do). But as is mentioned in another answer, it may well be possible to just use the Search API to gather entries after the contest is over, and not worry about storing them at all - you can specify pages when you ask for results (as outlined in the Search API link), but there are limits as to how many results you can fetch overall, which may cause you to miss some entries. What solution works best for your application is up to you.

cincodenada 2010-08-09 17:40:17

Thanks, this does seem like the most flexible solution.

DavidYell 2010-08-10 08:34:16

additionally, if you add a UNIQUE constraint to the tweet's id, you can bulk load the tweets from a CSV file and not worry about duplicates.

Jayrox 2010-08-23 20:26:07

you can use PHP's `fputcsv` to save the data you need from the tweet into a flat file (very fast). Then use MySQL's `load data local infile` and bulk load the tweets into the database. This is also very fast.

Jayrox 2010-08-23 20:30:54

Answer 2

A:

I read over your question and it seems to me that you want to duplicate data already stored by Twitter. Without more specifics on the competition your running, how users enter for example, estimated amount of entries; its impossible to know whether or not storing this information locally on a database is the best way to approach this problem.

Might a better solution to be, skip storing duplicate data locally and drag the entrants directly from twitter, i.e. when your attempting to find a winner. You could eliminate duplicate entries on-the-fly then whilst the code is running. You would just need to call "the next page" once its finished processing the 100 entries its already fetched. Although, i'm not sure if this is possible directly through the Twitter API.

2010-08-09 17:41:52

It is possible, and I have achieved this functionality already. Plus for metrics and records, the guys want the stuff filed with us as well as on Twitter :)

DavidYell 2010-08-10 08:58:03

Answer 3

+2 A:

100 queries in 5 minutes is nothing. Especially since a tweet has essentially only 3 pieces of data associated with it: user ID, timestamp, tweet, tweet ID - say, about 170 characters worth of data per tweet. Unless you're running your database on a 4.77MHz 8088, your database won't even blink at that kind of "load"

Marc B 2010-08-09 17:50:25

tweets from the API have significantly more data associated with them than 170 characters. json returns from twitter per tweet can be over 3KB and often are.

Jayrox 2010-08-23 20:24:10

Answer 4

A:

I think running a cron every X minutes and basing it off of the tweets creation date may work. You can query your database to find the last date/time of the last recorded tweet, then only run selects if there are matching times to prevent duplicates. Then, when you do your inserts into the database, use one or two insert statements containing all the entries you want to record to keep performance up.

INSERT INTO `tweets` (id, date, ...) VALUES (..., ..., ...), (..., ..., ...), ...;

This doesn't seem too intensive...also depends on the number of tweets you expect to record though. Also make sure to index the table properly.

Aaron W. 2010-08-09 18:07:30

ansaurus

tags:

views:

answers:

Twitter competition ~ saving tweets (PHP & MySQL)

related questions