I have a project that generates HTML pages using velocity template and Java. But most of the pages do not comply with W3C standards. How can I validate those HTML pages and get a log telling me what errors/warnings on what pages? Then I can fix the errors manually. I have tried JTidyFilter but that doesn't work for me.
There is also an experimental API available from W3C to help automate validation. They kindly ask that you throttle requests, and also offer instructions on setting up a validator on a local server. It's definitely more work, but if you're generating a lot of HTML pages, it would probably make sense to also automate the validation.
After extensive research and a little bit code hack, I've managed to use JTidyFilter in my project and it is working beautifully now. JTidyFilter is in JTidyServlet which is a sub-project of JTidy written about 5 years ago. Recently they've updated the codes to comply with Java 5 compiler. I downloaded their codes, upgraded some dependencies and most importantly, changed some lines in the JTidyFilter class which handles the filter and finally got it work nicely in my project.
There are still some issues in reformatting the HTML because I can see 1 or 2 errors when I use firfox HTML validation plugin but otherwise most pages pass the validation.