views:

271

answers:

2

Hi, I was wondering if there is a library in .Net to clean up and remove unclosed tags in an html document?

+2  A: 

html agility pack

http://www.codeplex.com/htmlagilitypack

Luke Schafer
Sorry to bother you again, I've tried to use Html Agility Pack but was not successful, what I did is to create a new HtmlDocument passing the string containing the html I want to fix in the constructor, however, I need to return the document as string which I dont know how to do it
ryudice
I parsed my text using the HtmlDocument class but it still leaves unclosed tags there, is there a way to remove them?
ryudice
Off the top of my head I can't remember, but try outputasxml, or there's another option on there to fix nested tags but I'm not sure under what circumstances it works.
Luke Schafer
Luke, I believe your referring to the answer I just gave to my own question. http://stackoverflow.com/questions/2175071/how-would-i-get-the-inputs-from-a-certain-form-with-htmlagility-pack-lang-c-ne
Codygman
I wasn't, I've used it before, but that's a great post and thanks for sharing
Luke Schafer
+1  A: 

HtmlTidy!

See the url below for more details:

http://www.devx.com/dotnet/Article/20505/0/page/2

The source of the download/project is:

http://tidy.sourceforge.net/

I gave the other link because it contains information about a .net wrapper and setting everything up. Hope this helps!

Codygman