views:

66

answers:

4

I have a service which takes the user supplied rich text (can have HTML tags) and saves it into the database. That data gets used by some other application. But sometimes the user supplied data has missing HTML tags and wrong closing tags. I want to validate if the user supplied data is valid HTML or not and depending on that I want to warn the user.

Are there any java libraries to do HTML validation?

+3  A: 

You can try JTidy.

JTidy is a Java port of HTML Tidy, a HTML syntax checker and pretty printer.

Desintegr
+2  A: 

You can try JTidy, but it's too slow for simple HTML cleaning.

If you want just process HTML you can try NekoHTML, it's lightweight and fast

splix
The only thing to consider is that you should not present your users with all kinds of validation error messages. Your users are probably average Joe's, they won't understand them.
Hans Westerbeek
I am going to try both and see which one works for me. Thank you!
chetu
A: 

Validator.nu, which implements the HTML5 spec, IMO.

Ms2ger
A: 

There's a great thing called NekoHTML which is just a thin wrapper over the Apache Xerces parser that turns on error-recovery/correction. It doesn't validate so much as error-correct, so you can process the result as XML, i.e. run it through XPaths or XSLTs. It has worked flawlessly for me for several months on completely arbitrary HTML from 3rd-party sites.

EJP