tags:

views:

88

answers:

3

Every html document is an xml document. In the current project there are a lot of html tags which are not properly closed. This is a ruby on rails application. I want to put an after filter which will parse the whole html output and will raise an error if the parsing detects that it is not a well-formed document.

In this case well-formed means that all the tags are properly closed. What is a good ruby parser to use in this case which is also fast.

+2  A: 

HTMLTidy seems to be the most popular plugin for other languages, and there is a RoR version available too.

http://blog.cosinux.org/pages/rails-tidy

EvilChookie
A: 

markup_validity provides some (X)HTML validation features. You can also use nokogiri as described here.

Simone Carletti
A: 

Why would you close your tags? It's only going to slow you down!

http://blog.errorhelp.com/2009/06/27/the-highest-traffic-site-in-the-world-doesnt-close-its-html-tags/

Michael Sofaer
If the page is going to claim to be XHTML then it should try to be valid XHTML. But yeah, if you're already counting on quirks mode, who cares.
Eli