views:

140

answers:

1

How do I get the exact feed.xml/rss.xml/atom.xml path of a website?

For example, I supplied "http://www.example.com/news/today/this_is_a_news", but the rss is pointing to "http://www.example.com/rss/feed.xml", most modern browsers have this features already and I'm curious how did they get them.

Can you cite an example code in ruby, python or bash?

+1  A: 

Something like this in Ruby will work...

require 'rubygems'
require 'nokogiri'
require 'open-uri'

html = Nokogiri::HTML(open('http://stackoverflow.com/questions/2441954/how-to-find-out-the-exact-rss-xml-path-of-a-website'))
puts html.css('link[type="application/atom+xml"]').first.attr('href')
#  => "/feeds/question/2441954"

Notice it's an absolute URL path, which is legal so you'd need to prepend the host info.

Also, "application/atom+xml" could also be "application/rss+xml" or "application/rdf+xml", and multiple links can be found in a page so you'll need to decide how to handle multiples. According to the autodiscovery docs the first one presented should be the preferred one, but from experience I've seen otherwise. Also, according to the docs the links should not be alternate data types (RSS and ATOM pointing to the same content) but should be different content, but again, I've seen that happen.

Greg
For more information about the autodiscovery link see:http://www.rssboard.org/rss-autodiscovery and http://philringnalda.com/rfc/draft-ietf-atompub-autodiscovery-01.html.
Greg