views:

44

answers:

2

I'm trying to get just some specific cells in each row using HTMLAgilityPack.

foreach (HtmlNode row in ContentNode.SelectNodes("descendant::tr"))
{
    //Do something to first cell
    //Do something to second cell
}

There are more cells, and each cell needs some specialized treatment. I guess there's a way to do this using XPath, but I'm fairly useless at that. Is there maybe something like

var cell1 = row.SelectSingleNode("descendant::td:first");
+1  A: 

To get each first cell that is a child of a row, you can do the following:

// from row
var firstCell = row.SelectSingleNode("td[1]");

// each first cell in a table (note: tbody is not always there)
var allFirstCells = table.SelectNodes("tbody/tr/td[1]");

In other words, use square brackets and the cell-number you wish to select. An exception is the last cell, which you can get using last() as follows:

// from row
var lastCell = row.SelectSingleNode("td[last()]");

// each last cell in a table
var allLastCells = table.SelectNodes("tbody/tr/td[last()]");

If you want to get the cell next to a current cell, you can do something like this:

// from row
var firstCell = row.SelectSingleNode("td[1]");
var siblingCell = firstCell.SelectSingleNode("./following-sibling::td");

You may wish to check the return values for null, which means you either have a typo, or the DOM tree you loaded does not contain the cell you asked for.

Abel
Sweet! Is that just "regular" XPath, or is it some special concoction from the htmlagilitypack-dudes?
peirix
@peirix: it is totally regular XPath. HtmlAgility doesn't add anything special. They create a .NET DOM, and the SelectNodes uses Microsoft's .NET implementation of XPath 1.0.
Abel
+1  A: 

Instead of:

descendant::tr

use:

descendant::tr/td[not(position() >2)]
Dimitre Novatchev
Abel