tags:

views:

113

answers:

2

What is the best way to extract RSS/ATOM URLs from HTML LINK tags? I know regex is not the best way to do this, so I'm wondering what alternatives I have. Surely some kind of horrible string munging using .Contains after loading the HTML into a string is not optimal either. Anyone got a decent strategy for this?

A: 

Maybe Html Agility Pack can help you. Have not use it. But hear good thing about it.

Igal Serban
A: 

Use XPath.

1. Convert an HTML into an XHTML with Tidy
2. With the XHTML, use XPath to search for the link
    /html/head/link[@type='application/rss+xml']
yogman