views:

23

answers:

1

Looking for a HTML converter which will produce valid XHTML code. One important thing is that it will need to insert P tags for paragraphs - something that seems to be missing from most popular ones.

I found John Resig's, but it does not insert P tags.

http://ejohn.org/blog/pure-javascript-html-parser/

For example, this:

Lorem ipsum dolor sit amet, consectetuer adipiscing elit.<br/><br/> Aenean commodo ligula eget dolor. 

Would become this:

<p>Lorem ipsum dolor sit amet, consectetuer adipiscing elit.</p>
<p>Aenean commodo ligula eget dolor. </p>
A: 

This article explains how to convert using HTML Tidy and then do further processing using normal XML tools:

http://www.ibm.com/developerworks/library/x-tiptidy.html

bemace