Hi, (I've seen similar questions, but I think none of them cater to my specific needs, hence...)
I would like to know if there is a Java library for analysis of real-world (read: incomplete, ill-formed) HTML. By analysis, I mean things like:
- figuring out the most prominent color in an HTML chunk
- changing that color to some other color (hence, has to support modification of the HTML as well)
- pruning out unwanted tags
- fixing up the HTML to result in a well formed HTML snippet
Parts of the last two are done by libraries such as Jericho, and jTidy. 'Plugins' on top of these would be great.
Thanks in advance!