ansaurus

Question

How can I get all content within <td> tag using a HTML Agility Pack?

Answer 1

A:

You'd probably get better mileage with an xml parser.

Josh Sterling 2010-06-12 05:30:53

Answer 2

A:

"Something else" is the best answer -- HTML is best parsed by an HTML parser rather than via regular expressions. I'm no C# expert, but I hear the HTML Agility Pack is well-liked for this purpose.

Alex Martelli 2010-06-12 05:31:41

I'm already using that. I updated my answer to reflect that.

Bob Dylan 2010-06-12 05:33:51

Answer 3

+1 A:

I'd say som̡et̨hińg Else

Felipe Alsacreations 2010-06-12 05:33:37

Normally I would agree with that too, but I think this is an exception becuase I'm looking for something so narrow. However if you could **actully suggest something else** I would be open to that too.

Bob Dylan 2010-06-12 05:36:43

Saw that coming..

BlueRaja - Danny Pflughoeft 2010-06-12 05:54:42

Answer 4

+1 A:

Since you are using Html Agility Pack already I would suggest using the methods it provides to find the information you want. There are a few ways to navigate the document, but one of the most concise is to use XPath. In this case you could use something like this:

HtmlDocument doc = new HtmlDocument();
doc.Load("input.html");
HtmlNode node = doc.DocumentNode
                   .SelectNodes("//table[@cellspacing='3']/tr[2]/td")
                   .Single();
string text = node.InnerText;

Mark Byers 2010-06-12 05:43:06

I think your on the right track, but I'm not seeing the `.Single()` method in intellisense. I'm using version 1.4.0 of the HTML Agility Pack.

Bob Dylan 2010-06-12 05:56:27

Add a reference to and use System.Data.Linq;

alexn 2010-06-12 05:57:19

@alexn: I did that and it's still not showing up.

Bob Dylan 2010-06-12 05:59:23

@Bob Dylan: That code was just an example. You don't *have* to use `Single()` if you don't have it available - you could just write `.SelectNodes(...)[0]` instead. Though knowing about Linq would be a huge asset for developing in C#.

Mark Byers 2010-06-12 06:02:21

@Mark: Ok I just tried using the `[0]` like you said and got an exception: `node`: "Object reference not set to an instance of an object". I assume this means it didn't find the table, tr, or the td?

Bob Dylan 2010-06-12 06:04:50

@Bob Dylan: Correct. You could change the XPath expression to "//table[@cellspacing=3]" and see if that matches.

Mark Byers 2010-06-12 06:07:35

@Mark: I tried that and it gave me the same error. Also I've updated my answer to show how I'm loading the document (just in case that makes a difference).

Bob Dylan 2010-06-12 06:14:37

Ok. I got it working.

Bob Dylan 2010-06-12 06:19:18

Answer 5

A:

If you're using the Agility pack already, then it's just a matter of using some thing doc.DocumentNode.SelectNodes("//table[@cellspacing='3']") to get the table in the document. Try looking through the documentation and coding examples. Since you already have structured data, it's rediculous to go back to the text data and reparse.

Eclipse 2010-06-12 05:43:51

ansaurus

tags:

views:

answers:

How can I get all content within <td> tag using a HTML Agility Pack?

related questions