tags:

views:

83

answers:

4

What im trying to do:

Fetch X numbers of RSS Feeds from my Blogs and echo only new entries. My Problem is, how to know wich items are already parsed?

Solution so far:

Fetch the Feed every 5 hours, store all titles inside an Database table or flat file. Next run check if the title is already in database if not print it and save it inside the database.

But iam not sure if this is best practise to do this?

If someone knows a fast way, it would be great. Sorry for my poor english.

A: 

I think you should store the date of the last post you fetched. When you fetch the next time, you can collect only that ones that are newer then the date you stored...

Seb
The ID is also very treacherous as different formats have different storing mechanism, it all depends on what is he reading exactly.
Ivo Sabev
+2  A: 

If the blog entries your are parsing have some date indicator, just have a field called CREATED of type DATETIME in your database and save this date value there. Then when you parse select the latest DATETIME SELECT MAX(CREATED) FROM posts LIMIT 1 and don't insert anything that has a date earlier than that one.

This solution might have a slight drawback if you expect some of your blogs to update their rss with delay, but keep the past date as their timestamp.

Ivo Sabev
A: 

Every feed has a unique ID associated with it. You can check that id and store it in database instead of storing the Title.

Try reading the docs from Pubsubhb http://superfeedr.com/documentation#pubsubhubbub

+1  A: 

I believe that the usual practice is to work off of the guid element in the RSS feed. This is sometimes the URI of the source article, sometimes a number, sometimes a traditional GUID.

Using this element to see if you have already received an article will negate the need to parse for a date and this is how Google Reader usually determines if an item has already been collected.

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"&gt;
<channel>
    <atom:link href="http://www.stevefenton.co.uk/RSS/Blog/" rel="self" type="application/rss+xml" />
    <title>Steve Fenton Blog</title>
    <link>http://www.stevefenton.co.uk/RSS/Blog/&lt;/link&gt;
    <description>Blog</description>
    <language>en</language>
    <copyright>Copyright 2008 - 2010 Steve Fenton</copyright>
    <category>Blog</category>
    <generator>Swift Point Content Management System</generator>
    <ttl>60</ttl>
    <managingEditor>[email protected] (Site Admin)</managingEditor>
    <item>
        <title><![CDATA[Jquery Plugin Infinite Scroller With AJAX]]></title>
        <link>http://www.stevefenton.co.uk/Content/Blog/Date/201004/Blog/Jquery-Plugin-Infinite-Scroller-With-AJAX/&lt;/link&gt;
        <description><![CDATA[Friday, 9th April 2010 - Jquery Plugin Infinite Scroller With AJAX <p>I have just finished a new plugin for the jQuery framework.</p><p>The jQuery Infinite Scroller is a great way to deliver a really long list of things, in smaller chunks. For example, if you were displaying articles you could load a page with the first 10 results, then dynamically add more results to the bottom of the list when people start scrolling down. The further they scroll, the more articles you add - thus making it theoretically infinite.</p><p>When the plugin detects that no more results are available, it stops trying to get more items to add.]]> &lt;a href="http://www.stevefenton.co.uk/Content/Blog/Date/201004/Blog/Jquery-Plugin-Infinite-Scroller-With-AJAX"&amp;gt;View Details&lt;/a&gt;.</description>
        <guid>http://www.stevefenton.co.uk/Content/Blog/Date/201004/Blog/Jquery-Plugin-Infinite-Scroller-With-AJAX&lt;/guid&gt;
    </item>
    <item>
        <title><![CDATA[Auto Load Your PHP Classes]]></title>
        <link>http://www.stevefenton.co.uk/Content/Blog/Date/201004/Blog/Auto-Load-Your-PHP-Classes/&lt;/link&gt;
        <description><![CDATA[Wednesday, 7th April 2010 - Auto Load Your PHP Classes <p>In PHP5 you can create classes to organise your code and represent objects that you want to pass around. This has long been a feature of other languages and was a fundamentally important step forward for PHP.</p><p>There was one thing, though, that I didn't like about PHP classes. If I wanted to instantiate a new "Customer" or "Product", I had to make sure that I included the PHP file that contained the "Customer" or "Product" class. This meant doing this:</p><p>[[#CODE:php:<br>include_once 'classes/Customer.php';</p>]]> &lt;a href="http://www.stevefenton.co.uk/Content/Blog/Date/201004/Blog/Auto-Load-Your-PHP-Classes"&amp;gt;View Details&lt;/a&gt;.</description>
        <guid>http://www.stevefenton.co.uk/Content/Blog/Date/201004/Blog/Auto-Load-Your-PHP-Classes&lt;/guid&gt;
    </item>

</channel>
</rss>
Sohnee