So I'm writing an application that will do a little screen scrapping. I'm using the HTML Agility Pack to load an entire HTML page into an instance of HtmlDocoument
called doc
. Now I want to parse that doc, looking for this:
<table border="0" cellspacing="3">
<tr><td>First rows stuff</td></tr>
<tr>
<td>
The data I want is in here <br />
and it's seperated by these annoying <br /> 's.
No id's, classes, or even a single <p> tag. </p> Just a bunch of <br /> tags.
</td>
</tr>
</table>
So I just need to get the data within the 2nd row. How can I do this? Should I use a regex or something else?
Update: Here is how I'm loading my doc
HtmlWeb hw = new HtmlWeb();
HtmlDocument doc = hw.Load(Url);