OK, there are many HTML/XML parsers for Java. What I want to do is a bit more than just knowing how to parse it. I want to filter the content and have it in suitable form.
More precisely, I want to keep only the text and images. However, I want to preserve some of the text formatting, too, like: italic, bold, alignment, etc.
All this is for the reason that I'm trying to implement a converter that converts html to a specific format that I've created myself for my own purposes.
Any ideas? Surely, it must have been done many times before.
Thanks, guys!