tags:

views:

74

answers:

3

Hello all,

I have a task to do.

I need to download a web page and to see if the page contains any RSS feeds.

I know how to download a web page to string using Http APIs in C#, but how can I determine the http page string contains any RSS feeds or not?

Thanks

Jack

+1  A: 

I expect you would have to load the page into a dom (XmlDocument, XDocument or HtmlDocument) and check for any nodes like:

<link rel="alternate" type="application/atom+xml" ...

This should be (in xpath) something like "/html/head/link[@rel='alternate' and @type='application/atom+xml']" - then look at @title and @href.

Marc Gravell
+1  A: 

Instead of loading the HTML into an XMLDocument (which may not be possible if it isn't XHTML compliant), try the HTML Agility Pack instead. It gives you XMLDocument-like syntax but you can use malformed HTML with it.

but generally, you would look for that link tag in the pages head..

spmason
+1  A: 

Use a regular expression to check the HTML for the link tag.

An exhaustive approach would be to spider each href link and examine the content-type and presence of rss or atom tags...

Codebrain
The `<center>` cannot hold it is too late. http://stackoverflow.com/questions/1732348#1732454
Marc Gravell
considering he is searching for a known tag it's not unreasonable to use RegEx in this case IMO
Codebrain