I'm having problems getting to the rss link that tells the browser where the rss is for the site. The link is found in the <head>
tag of the html here is an example of what the link looks like.
<link rel="alternate" type="application/rss+xml" title="CNN - Top Stories [RSS]" href="http://rss.cnn.com/rss/cnn_topstories.rss" />
My original approach was to treat the site like an XML file and look through the tags, but most sites have an arbitrary number of <meta>
tags that forget to have a ending />
so the <link>
tag I'm looking for becomes a child of a random <meta>
tag.
Now I'm thinking of just treating the site like a string and looking for the <link>
tag in it, but this causes problems since the <link>
tag can have its attributes in any order possible. Of course I can work around this, but I would prefer something a bit neater than look for type="application/rss+xml"
then look to the left and right of it for the first href
it sees.