views:

41

answers:

1

The quest is, given a site url (say http://stackoverflow.com/ ) to return the list of all the feeds available on the site. Methods acceptable:

a) use a 3rd party service (google?, yahoo?, ...) programmatically b) using a crawler/spider (and some tips on how to configure the spider to return the rss/xml feeds only) c) programmatically using c/c++/php (any language/library)

The task here is not to get the feeds contained on the page returned by the url but ALL the feeds that are available on the server at any depth... in any cases please provide a simple usage example.

A: 

The only way I know of doing this is to depend on the RSS discovery protocol, which has beben around for about 4 years. Crawl the site, and look in the HTML pages for the RSS auto-discovery tags:

<link rel="alternate" type="application/rss+xml" 
      title="Something" 
      href="http://www.example.com/feed1.xml” />
Cheeso
This is correct but the task here is the crawling itself.
ktolis
It's the same idea, only, look for anchor tags. Request the home page, store any RSS tags, then look for anchor tags, request those pages, repeat.
Cheeso