tags:

views:

59

answers:

2

Sorry for the long title and perhaps confusing half good now as we come. I'm asking advice or guidance on how I can get an RSS feed from a page that does not have RSS enabled by default. But that is not the problem itself. The problem is when on that page I am asked to enter a username and password. Well so otherwise would be the thing...

PROBLEM:

Get the RSS of a forum which does not have an RSS feed enabled and to see the 'news' we need to be logged.

POSSIBLE SOLUTIONS that come to mind:

  1. There are several web sites which offer services in English to get RSS on pages where they are not. That's fine, but the problem is when these sites don't offer an option to login with a username and password to the web page where I want to get the info, so these types of sites are excluded.
  2. I did not login via url and so put that url on web sites listed above (item 1) of the forum with the username and password variables directly from the url spec: www.forosinrss/login.php?usuario = me & password = your pff and I'm bounced the forum, telling me I'm not getting the correct data as we will be. Another problem is that the password is md5 encrypted, so I'm prevented from logging in with the URL (fk T_T).
  3. Try using "SELECT * FROM DB Internet", or in other words, to use YQL. But it came out almost as much as they found no way to insert and log into user and password and also to generate a cookie for the forum is not happy I voted.

I need suggestions, recommendations, tips or complaints.

A: 

Download the page using something like cURL or fsockopen if you're feeling brave, then transform the page from html to rss using XSLT Stylesheets.

Andrew Dunn
A: 

Once upon a time I wrote an app in PHP to do this with ok-ish results:

  • use curl to get the page and keep a copy
  • run a custom filter regular expression to select the bit of the page that actually matters (some sites have dynamic text like ads or just displaying the current date and time)
  • after a timeout, use curl to get the page again and run the same filter on it
  • run diff old_page, new_page and pipe the result into an rss template

The system worked ok but was fiddly filtering the page down to content that I wanted to get the feed from and it broke a lot because these kinds of sites are often hand edited so you can't guarantee any consistency.

MattSmith