I'm looking for the Clojure/Java equivalent to Python's lxml library.
I've used it a ton in the past for parsing all sorts of html (as a replacement for BeautifulSoup) and it's great to be able to use the same elementtree api for xml as well -- really a trusted friend! Can anyone recommend a similar Java/Clojure library?
About lxml
lxml is an xml and html processing library based off of libxml2. It handles broken html pages very well so it is excellent for screen scraping tasks. It also implements the ElementTree api, so the xml/html structure is represented as a tree object with full support for xpath and css selectors among other things.
It also has some really handy utility functions such as the "cleaner" module which will strip out unwanted tags from the "soup" (ie script tags, style tags, etc...).
So it is simple to use, robust, and VERY fast...!