tags:

views:

71

answers:

2

hay all. i am trying to transform html to xml meaning extracting all elements with text using this code is not working maybe some one has the answer ?

System.Xml.Linq.XElement query1 = new System.Xml.Linq.XElement("RawHTMLData",
           from q in hDoc.Descendants("TABLE")
           where q.HasElements 
           select new System.Xml.Linq.XElement("TABLE" + (++i).ToString(),
           from j in q.Elements("TR")
           where j.HasElements && j.Descendants("div") != null
           select new System.Xml.Linq.XElement("Row",
           from hh in j.Descendants("div")
           where tt => j.Descendants("div").Contains(hh.Value) 
           select(TT(hh)))));
A: 

You cannot use Linq to Xml to parse HTML becase html may be not valid as xml.

Andrew Bezzub
yes i know .. this problem is already solved by replacing the bad strings. so no i canbut i want to chose only the XText element \or only the element with value
guy
A: 

Not sure if this would work for you but you might look at using a third party tool such as HTML Tidy to convert from HTML to XHTML. Then you can treat your html like XML. Here is a link to a post discussing that.

Abe Miessler