tags:

views:

58

answers:

1

Hi all

I'm trying to parse multiple RSS feeds and If they change, thus update my records in my MySQL table.

Currently, I have a script that inserts items of RSS Feeds (just post in the url in a form and submit). This inserts the following into my table: title, rss_url, description, price, discount, total

This all works perfectly well.

The next part is a script that updates the rows if they change in the RSS, but the only changes are if the price or discount update. This also works great

What I'm also looking to do is: If an item in the RSS feed is removed, then my script needs to detect this and delete the row or insert a flag into my table to say its been deleted...

My code is quite long winded:

$result = mysql_query("SELECT * from easy_contents");
while($row = mysql_fetch_array($result))
{

$articles = array();
$easy_url = $row['rss_url'];

$rawFeed = file_get_contents($easy_url);
$xml = new SimpleXmlElement($rawFeed);


$channel = array();
$channel['title']       = $xml->channel->title;
$channel['link']        = $xml->channel->link;
$channel['description'] = $xml->channel->description;


foreach ($xml->channel->item as $item)
{
$article = array();
$article['title'] = $item->title;
$article['link'] = $item->link;
$article['description'] = (string) trim($item->description);

//strip out all the HTML tags
$item->description = str_replace('<table><tr><td width="110">','', $item->description);
$item->description = str_replace('</table>','', $item->description);
$item->description = str_replace('</td>','', $item->description);
$item->description = str_replace('<td>','', $item->description);
$item->description = str_replace('<br />','', $item->description);
$item->description = str_replace('<b>','', $item->description);
$item->description = str_replace('</b>','', $item->description);
$item->description = str_replace('</tr>','', $item->description);

//find all url encoded £ signs and find the string after
//string will be a price
preg_match_all('/&#xA3;([0-9.]+)/', $item->description, $results);
foreach ($results as $k => $v) {
}

//find the url encoded £ sign and append the price
$all = '&#xA3;'.$v[0];
$price_stripped = str_replace($all, '', $item->description);
$desc = preg_match('/&#xA3;([0-9.]+)/', $item->description);

//find the discount deleviry cost from the rss using the ~#&pound;NUMBER
//this is the discount
preg_match_all('/~#&pound;([0-9.]+)/', $item->description, $discount);
foreach ($discount as $d => $disc) {
str_replace("~#&pound;","", $disc[0]);
}

//find the remaining £PRICE and this is the delivery cost
//this is the delivery_cost
preg_match_all('/&pound;([0-9.]+)/', $item->description, $delivery_cost);
foreach ($delivery_cost as $del => $deliv) { 
}

 //find the | char and find the string after it
//this is the retailer_message
preg_match_all('/\|(.*?)\./',$item->description,$match);           
foreach ($match as $rel => $retail) { 
$retail[0] = str_replace("| ","", $retail[0]);
$retail_mess = str_replace(" On","On", $retail[0]);

 }   

 $total = $v[0] + $deliv[0] - $disc[0];

 $sql = "UPDATE easy_contents SET delivery_cost = '$deliv[0]', price = '$v[0]', total = '$total' WHERE rss_url = '$row[rss_url]' AND title = '$item->title' AND description = '$price_stripped' ";
 if(!$query = mysql_query($sql)) {
     echo "Error on line ".__LINE__.". ".mysql_error().".<br />\nQuery: ";
     exit;
 }
 echo "Query OK. <br />\nUpdated rows: ".mysql_affected_rows().".<br />\nQuery: ";
   }   
  }

This updates the row in the database depending if the rss item changes.

Can anyone provide a snippet of how I'd detect if an item in the rss is deleted and also the php/mysql to then delete such row from my table?

Thank you

A: 

If simply replacing your data with new data from the RSS feed won't work for you, you could run through a few steps:

  1. Query all from DB. Parse into array with ID
  2. Parse RSS into Array with ID.
  3. Compare arrays. The difference will be the ID's to delete from your DB.
  4. Loop through difference array and delete.

I do something similar on an app I wrote. It's a long solution, but once you get the bugs out it works really well.

bpeterson76
Could you provide a snippet of code? I cannot compare arrays via the ID as the RSS feed items do not have one. The only thing in common with the db rows and RSS item is the string 'On sale at XYZ' (On sale at Amazon, On sale at Play, On sale at Misco etc)
Ok, so the million dollar question I guess--is there a reason you can't just replace all rather than having to worry about updating and deleting? (this would be so much easier if you just used a true webservice like Amazon's AWS)
bpeterson76
Yes there is a reason why I can't. When the RSS items are added initially, they get an associated wp_posts_id. Each item is a product, which is displayed on a wordpress post. So I cannot simply replace all the rows as I'd lose that id.
Could you post a structure for the RSS Feed? It looks like each item's URL (displayed in the product link) does have a unique ID for the product which could be parsed to serve as a unique ID for each item. If so, the job just got a LOT easier.
bpeterson76
http://www.easycontentunits.com/rss/51040/669/rss2.rss is one RSS feed I've imported into my database. I'm extracting the title, the actual url of the feed, the price (£309), the delivery cost (~#£18.95#~) the discount (*^£0.00^*) and the retailer_message (On sale at Be Direct.) But, if the record for Be Direct. is removed from the RSS feed, then I need that deleting from the database. Also if a record is added, then I need a way of adding this. You are correct in thinking each product has a unique id. In this case it is '51040' the other id is the merchant id and is the same for every feed
Looking at RSS you just sent, it does appear that each item has a unique value to it called pid. If you parsed that string down to just the pid value, you could use that as a unique ID that could be used for comparison purposes. Without some sort of comparison value to work from, doing this delete is going to be all but impossible. Should just be a bit more logic on your DB inserts.
bpeterson76
Ok, I could get the pid into my table if that would help me? Once done, would it be easier for you to aid me in what I'm looking to do?
Yes indeed! Thanks.
bpeterson76
Ok, I have managed to get the pid into the database. What would I need to do next?
Alright, shouldn't be too tough. We need to get similar arrays first--I would think it would be acceptable to just fill a one-dimensional query with ID's from your table and the feed, respectively. Then, run an array_diff() on the two arrays. Any difference between the two will result in an array of id's--which will be what you delete from your table.
bpeterson76
Could you provide any code on how I'd do this?