tags:

views:

76

answers:

2

I have this code:

var url = textBox1.Text;
WebClient wc = new WebClient();

var page= wc.DownloadString(url);
XElement doc = XElement.Parse(page);

It fails with exception about unexpected characters. Obviously, the HTML i'm trying to parse in such a dumb way is not strict xml. What's the next easiest way to parse arbitrary HTML to something IQueriable?

What I actually want is to grab a table inside and paging links. Then parse them on my own with LINQ.

A: 

The best way that I can think of is to search for the tags and parse everything inside, same for the tags containing the paging links. Hopefully narrowing it down to that should make a manual parser to write.

ridecar2
+1  A: 

Have a look at the HTML Agility Pack:
http://www.codeplex.com/htmlagilitypack

AUSteve
Yeah, kinda trying with this one. Seems like it fits my needs.
Alexander Taran