What is the best way to extract RSS/ATOM URLs from HTML LINK tags? I know regex is not the best way to do this, so I'm wondering what alternatives I have. Surely some kind of horrible string munging using .Contains after loading the HTML into a string is not optimal either. Anyone got a decent strategy for this?
A:
Maybe Html Agility Pack can help you. Have not use it. But hear good thing about it.
Igal Serban
2008-12-03 23:31:07
A:
Use XPath.
1. Convert an HTML into an XHTML with Tidy
2. With the XHTML, use XPath to search for the link
/html/head/link[@type='application/rss+xml']
yogman
2008-12-03 23:33:02