views:

237

answers:

2

I am trying to read data from an RSS feed which has 25 items. When I request the RSS file through HTTP it says there are only 20 items.

function test($location)
{
    $doc = new DomDocument();
    $doc->load($location);
    $items = $doc->getElementsByTagName('item');
    return $items->length;
}

// Prints 20
echo test('http://www.reddit.com/r/programming/new/.rss?after=t3_');

// Prints 25
echo test('programming.xml');

I've tried RSS feeds from other subreddits as well with the same result.

A: 

If it were having problems loading the feed, it'd probably issue a warning of some sort.

Right now, your sample code for the reddit feed shows that it has 14 items. The number of items in that feed is not constant. So the issue is that your local copy is different that the one you were loading from reddit.

Juan
Exactly my point. If you go to the feed URL and save it locally. Then load THAT in the script it will work. But requesting the same file through HTTP seems to have different results.
Kevin
I just tested accesing the file 3 different ways: Giving test the url directly, downloading from the browser and downloading with wget. And the 3 times it showed me the same amount 16. That feed seems to change it's contents quite often, you cant rely on a local copy to have the same content as the online version.
Juan
Viewing the rss feed in my browser it always shows 25 items. When I access it through other means the number is variable. I wonder if this has to do with their API.
Kevin
A: 

I see what the issue is now... If you visit a sub-reddit like /r/programming/ and go to the "new" tab to see newest submissions, there are two sorting options. The first option is "rising" which only shows up-and-coming entries, the alternate sort order is "new".

Since I chose the "new" sort order in my browser it saved a cookie and was used as the default sort order afterwards. However, accessing the page through code was still using the default sort order, which returned a variable amount of results.

I resolved the issue by appending the sort order query string to the request url: http://www.reddit.com/r/programming/new/.rss?sort=new

Kevin