ansaurus

Question

Capturing the rel type and href of links in c#

Answer 1

+1 A:

parse you HTML using the HTML Agility pack library, which can be found here

Rony 2009-06-18 18:59:31

Thanks for the link.

James W 2009-06-18 19:34:01

Answer 2

A:

You'd be better off using a real HTML parser like the Html Agility Pack. You can get it here.

A main reason for not using regular expressions for HTML parsing is because it might not be well-formed (almost always the case), which could break your regular expression parser.

You would then use XPath to get the nodes you need and load them into variables.

HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(pageMarkup);
HtmlNodeCollection nodes = htmlDoc.DocumentNode.SelectNodes("//link");
string rel;

if(nodes[0].Attributes["rel"] != null)
{
    rel = nodes[0].Attributes["rel"]; 
}

Dan Herbert 2009-06-18 19:12:46

Thanks. I am giving you the check mark because your answer had helpful code, and you explained why to use the parser instead of a regex.Thanks to Rony too for the link to HTML Agility pack, I just downloaded it.

James W 2009-06-18 19:29:51

ansaurus

tags:

views:

answers:

Capturing the rel type and href of links in c#

related questions