views:

177

answers:

3

I am parsing html file with the help of the html agility pack to extract the table data from the html file. But there is some html files where there is no ending tags which is optional or there is no starting tag which is also optional.So html agility pack does not parse that html page properly.If I open the content of that html file in the notepad++ then with the option TestFX-->TestFX HTML Tidy-->TiDy clean document and make the content tidy like this. And now this file If I parse with the html agility pack then it parse it properly.

Making html page tidy with notepad++ is best option.

So I don't know but user can not do this like first he/she makes the page tidy with notepad++ and then go ahead.Then what should I do ?

EDIT I have used html tidy pack but in some case there is file which is tidied with that is not parsed but if I make this page tidy in notepad++ then it is parsed.

+2  A: 

HTML Tidy is independent of Notepad++ and you can use this open source component directly in your .NET (or other language) project.

More details on using this in .Net specifically can be found here

Macros
+3  A: 

I think Notepad++ is using the HtmlTidy library, and so can you. The main page is here.

Or maybe you can use a service like HrmlTidy online

Edit: you seem to want to use notepad++ (on top of HtmlTidy). NP++ has a limited set of command options so loading the file won't be the problem. But I couldn't find any reference of an interface to do the rest of what you need: Tidy the HTML and Save the results.

Henk Holterman
@Henk Holterman,I have used this but It does not work always.
Harikrishna
@Henk Holterman,How can I do that,any refernce ?
Harikrishna
A: 

HTML Tidy is also available separately and is just used as a plugin in Notepad++. You may want to use it directly in your app. Have a look at http://tidy.sourceforge.net/ . Implementations for many languages are available.

Shubh