views:

643

answers:

3

I want to scrape a list of facts from simple website. Each one of the facts is enclosed in a <li> tag. How would I do this using Html Agility Pack? Is there a better approach?

The only things enclosed in <li> tags are the facts and nothing else.

+3  A: 

Something like:

List<string> facts = new List<string>();
foreach (HtmlNode li in doc.DocumentNode.SelectNodes("//li")) {
    facts.Add(li.InnerText);
}
Marc Gravell
A: 

How about a simple regex?

Dim tMatch As Match = Nothing
For Each tMatch In RegEx.Matches("\<li\>(?<Fact>.*?)\<\/li\>", tHTMLString)
    Console.WriteLine(tMatch.Groups("Fact").Value)
Next
Boo
+1  A: 

Note that SelectNodes returns null when "... no node matched the XPath expression".

CS