views:

335

answers:

1

I am using html agility pack to parse html tabular information. Now there is some html content with missing ending tags and from such page because of missing ending tags html agility pack does not parse information properly.So I want to insert ending tags where there are missing ending tags so html agility pack parse information properly. So to insert the missing ending tags what should I do ?Should I do write my own code for that or use html tidy pack to do that ?

If html tidy pack then which is the best html tidy pack,and how to use it any example if possible ? And if my own code than what it can be like ?

Is there any option in html agility pack which can make us able to first make the html page tidy and then parse the webpage.

+2  A: 

In Html Agility Pack I could not find any option that make html page tidy.There is one option that inserts the missing closing tags but it works in some html page only.That Option in html agility pack is,

  HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
      doc.OptionFixNestedTags=true;

I have also tried regex for that but it also work for some html page only.

So I found the best html tidy pack is :

http://www.devx.com/dotnet/Article/20505/1763/page/2.

We can see there : how to import the dll and how to use that tidy pack, there is sample code also available. It is great at all.It can insert the missing closing tags and makes your html page tidy .

Thanks for helping everyone..

Harikrishna