



I'm looking for pure Ruby (or Java) solutions for beautifying HTML code.

I'm currently using Hpricot to parse HTML, since Nokogiri and other HTML parsers require external C programs. I assume that I can use Hpricot to clean up HTML if I can come up with a good algorithm. I'd prefer not to reinvent if this has already been done.

+1  A: 

Perhaps you can try jtidy?

"JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer. Like its non-Java cousin, JTidy can be used as a tool for cleaning up malformed and faulty HTML. In addition, JTidy provides a DOM interface to the document that is being processed, which effectively makes you able to use JTidy as a DOM parser for real-world HTML.

JTidy was written by Andy Quick, who later stepped down from the maintainer position. Now JTidy is maintained by a group of volunteers.

More information on JTidy can be found on the JTidy SourceForge project page ."

Mark Thomas