views:

5358

answers:

3

Hello, I want to use the HTML agility pack to parse tables from complex web pages, but I am somehow lost in the object model. I looked at the link example, but did not find any table data this way. Can I use Xpath to get the tables? I am basically lost after having load the data how to get the tables. I have done this in Perl before and it was a bit clumsy, but worked ok. ( HTML::TableParser). I am also happy if one can just shed a light on the right object order for the parsing.

+14  A: 

How about something like:

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(@"<html><body><p><table id=""foo""><tr><th>hello</th></tr><tr><td>world</td></tr></table></body></html>");
foreach (HtmlNode table in doc.DocumentNode.SelectNodes("//table")) {
    Console.WriteLine("Found: " + table.Id);
    foreach (HtmlNode row in table.SelectNodes("tr")) {
        Console.WriteLine("row");
        foreach (HtmlNode cell in row.SelectNodes("th|td")) {
            Console.WriteLine("cell: " + cell.InnerText);
        }
    }
}

Note that you can make it prettier with LINQ-to-Objects if you want:

var query = from table in doc.DocumentNode.SelectNodes("//table").Cast<HtmlNode>()
            from row in table.SelectNodes("tr").Cast<HtmlNode>()
            from cell in row.SelectNodes("th|td").Cast<HtmlNode>()
            select new {Table = table.Id, CellText = cell.InnerText};

foreach(var cell in query) {
    Console.WriteLine("{0}: {1}", cell.Table, cell.CellText);
}
Marc Gravell
+4  A: 

The most simple what I've found to get the XPath for a particular Element is to install FireBug extension for Firefox go to the site/webpage press F12 to bring up firebug; right select and right click the element on the page that you want to query and select "Inspect Element" Firebug will select the element in its IDE then right click the Element in Firebug and choose "Copy XPath" this function will give you the exact XPath Query you need to get the element you want using HTML Agility Library.

Coda
A: 

Hi Marc can you tell me how would I make a query to edit the values inside of the tags that I find. Like if in the My Title tag. How would I use html agility wrapper to change that My Title Name. I am trying to make a desktop application that will change specific tag values. Like a search and replace app that will find things by tags.

Jerome