views:

75

answers:

2

Hi there. I have a script that runs every two minutes for a "Tweet-getter" application. In a nutshell it puts tweets onto Facebook. Every now and then it hiccups and despite my error checking, reposts old tweets continuously, every two minutes (the cycle of it being run as a cron job). I have a log.txt that in theory would help me determine what's going on here, but the problem is it isn't being written to every time the job runs. Here's the code:

<?php
$start_time = microtime();
require_once //a library and config
$facebook = new Facebook($api_key, $secret);
get_db_conn(); //returns $conn

$hold_me = mysql_fetch_array(mysql_query("SELECT * FROM `stats`"));
$last_id_posted = $hold_me[0]; //the status # of the most recently posted tweet

$me = "mytwittername";
$ch = curl_init("http://twitter.com/statuses/friends_timeline.xml?since_id=$last_id_posted");
curl_setopt($ch, CURLOPT_USERPWD, $me.":".$pw);     
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$xs = curl_exec($ch);
$data = new SimpleXMLElement($xs);
$latest_tweet_id = $last_id_posted;
$uid = get_uid(); //returns an array of facebookID->twittername
$user_count = count($uid);
curl_close($ch);

$total_tweets = 0;
$posted_tweets = 0;
foreach ($data->status as $tweet) { 
$name = strtolower($tweet->user->screen_name);

if (array_key_exists($name, $uid)) {
  $total_tweets += 1;
  // $name = Twitter Name
  $message = $tweet->text;
  $fbid =  $uid[$name];
  theposting($name,$message,$fbid); //posts tweet to facebook
  $this_id = $tweet->id;
  if ($this_id > $latest_tweet_id) {
   $latest_tweet_id = $this_id;
  }
 } 
}
mysql_query("UPDATE stats SET lasttweet='$latest_tweet_id'");
commit_log(); //logs to a txt file how many tweets posted, how many users, execution duration, and time of execution
?>

So in theory the log is a string of "Monday 24th of August 2009 10:41:32 PM. Called all since # 3326415954. Updated to # 3526415953. 8 users. Took 0.086057 milliseconds. Posted 14 out of 20 tweets." lines. Occasionally though, it will skip two or three hours at a time, and in that time period it will "spam" people's facebook pages with multiple copies of the same tweet. I can't tell what might be breaking my code, but my suspicion is bad XML from twitter. All in all it's relatively low-traffic on my end, so I doubt I'm overloading my server or anything. The log.txt is 50kb right now, and last "broke" at ~35kb, so it's not a huge file slowing it down... Any thoughts would be appreciated!

+2  A: 

The first thing I would do to improve the script is to check for cURL errors curl_errno & curl_error. Chances are if anything is going wrong it will be from there if your malformed XML theory is correct. You may also want to specify a timeout for both cURL and PHP.

I've not used the SimpleXML library, but it does look as if there is a check for malformed XML, it'll produce an E_WARNING if it's not well-formed.

Those 2 bits should elminate any dodgy data.

Without seeing the other functions it's a bit hard to see any other potential places where it could be going wrong.

Phil Carter
A: 

You should test to make sure that your database query was successful.

Try selecting only the $last_id_posted in your SQL select, since you are throwing away the rest of the row anyways.

$last_id_posted has no default value. What is the expected result of ?since_id=

Serialize the state of your db/curl response & XML and dump into your log file.

simplemotives