views:

101

answers:

3

How can I loop through table and row that have an attribute id or name to get inner text in deep down in each td cell? I work on asp.net, c#, and the newest html agility package. Please guide. Thank you.

An html file have several tables. One of them has an attribute id=main-part. In that identified table, there are many rows. Some of those rows have same attribute name=display. In those named rows, there are many columns which I have to extract text from. Something like this:

<body>
<table>
...
</table>
<table>
...
</table>

<table id="main-part">
   <tr>
     <td></td>
     ...
   </tr>
   <tr>
     <td></td>
     ...
   </tr>
   <tr name="display">
     <td>Jan</td>
     <td>Feb</td>
     <td>Mar</td>
     ...
   </tr>
      <tr name="display">
     <td>Apr</td>
     <td>May</td>
     <td>June</td>
     ...
   </tr>
      <tr name="display">
     <td>Jul</td>
     <td>Aug</td>
     <td>Sep</td>
     ...
   </tr>
   <tr>
     <td></td>
     ...
   </tr>
   <tr name="display">
     <td>Oct</td>
     <td>Nov</td>
     <td>Dec</td>
     ...
   </tr>
   <tr>
     <td></td>
     ...
   </tr>
</table>
<table>
...
</table>
</body>
A: 

You need to select these nodes using xpath:

foreach(HtmlNode cell in doc.DocumentElement.SelectNodes("//tr[@name='display']/td")
{
   // get cell data
}
Oded
Thank you for your help. For the new package I got, I use DocumentNode in place of DocumentElement.
David
A: 

please refer this link.An example as how to use htmlagility package..

click here

Asif khan
I followed the link and there is something interesting. Thank you.
David
+1  A: 

It worked! Thank you very much Oded.

    HtmlDocument doc = new HtmlDocument();
         doc.Load(@"C:/samplefolder/sample.htm"); 
foreach(HtmlNode cell in doc.DocumentNode.SelectNodes("//tr[@name='display']/td")) 
{
         string test = cell.InnerText;
         Response.Write(test); 
}

It showed result like JanFebMarAprMayJuneJulAugSepOctNovDec. How can I sort them out, separate by a space or a tab? Thank you.

David
@David - perhaps this should be a new question? Anyways, when you do a `Response.Write`, you can add the comma at the end - `Response.Write(test + ",");`
Oded
@David - you should also upvote and accept answers, if they did help. See the FAQ - http://stackoverflow.com/faq
Oded
just use Response.WriteLine(test); INSTEAD OF Response.Write(test); and at the end of it all do a Response.ReadLine(); so you can see it and confirm it all is as you expect before pressing [Enter] to exit the screen.
Erx_VB.NExT.Coder