tags:

views:

60

answers:

3

Hi, I have a script which captures tweets and puts them into a database. I will be running the script on a cronjob and then displaying the tweets on my site from the database to prevent hitting the limit on the twitter API.

So I don't want to have duplicate tweets in my database, I understand I can use 'INSERT...ON DUPLICATE KEY UPDATE' to achieve this, but I don't quite understand how to use it.

My database structure is as follows.

Table - Hash id (auto_increment) tweet user user_url

And currently my SQL to insert is as follows:

$tweet = $clean_content[0];
$user_url = $clean_uri[0];
$user = $clean_name[0];

$query='INSERT INTO hash (tweet, user, user_url) VALUES ("'.$tweet.'", "'.$user.'", "'.$user_url.'")';
mysql_query($query);

How would I correctly use 'INSERT...ON DUPLICATE KEY UPDATE' to insert only if it doesn't exist, and update if it does?

Thanks

A: 

Try using:

$query='INSERT INTO hash (tweet, user, user_url) 
        VALUES ("'.$tweet.'", "'.$user.'", "'.$user_url.'") 
        ON DUPLICATE KEY UPDATE tweet = VALUES(tweet)';
codaddict
A: 

ON DUPLICATE KEY UPDATE doesn't seem to be the right solution here, as you don't want to update if the value is already in the table.

I would use Twitter's own unique Status ID field (which should be unique for each tweet) instead of your hash id. Add that as a field on your table, and define it as the primary key (or as a unique index.) Then use REPLACE INTO, including the status ID from Twitter.

This has the advantage that you can always track your record back to a unique Tweet on twitter, so you could easily get more information about the Tweet later if you need to.

Matt Gibson
+1  A: 

you need some UNIQUE KEY on your table, if user_url is tweer_url, then this should fit (every tweet has a unique url, id would be better).

CREATE TABLE `hash` (
  `user_url` ...,
  ...,
  UNIQUE KEY `user_url` (`user_url`)
);

and its better to use INSERT IGNORE on your case

$query='INSERT IGNORE INTO hash (tweet, user, user_url) VALUES ("'.$tweet.'", "'.$user.'", "'.$user_url.'")';

ON DUPLICATE KEY is useful when you need update existing row but you want to insert just once

Lauri
Thanks for your response, this seems like it would work for me. Although I think I will use the ID of the twitter post rather than the URL, I assume the ID is the 11 digit number, for example - 12935503743
Probocop