I've looked for tutorials on using HTML Agility Pack as it seems to do everything I want it to do but it seems that for such a powerful tool there is little noise about it on the Internet.
I am writing a simple method that will retrieve any given tag based on name:
public string[] GetTagsByName(string TagName, string Source) {
...
}
This can be easily done using a Regular Expression but we all know that using the regex for parsing HTML isn't right. So far I have the following code:
...
// TODO: Clear Comments (can this be done or should I use RegEx?)
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(Source);
ArrayList tags = new ArrayList();
string xpath = "//" + TagName;
foreach (HtmlTextNode node in doc.DocumentNode.SelectNodes(xpath) {
tags.Add(node.Text);
}
return (string[])tags.ToArray(typeof(String));
I would like to be able to first strip all comments from the HTML, then return the correct tag based on its name. If possible I'd also like to return certain meta-tags based on attribute, such as robot. I'm not that great with xpath, so any help with that would be good.
Any help would be much appreciated.