I was thinking about creating a nice browsable HTML version of my manpages and it turns out that doclifter does just what I want via the manlifter program, since it can lift TROFF into DocBook.
However, it got me thinking that it would be quite useful to have a similar library that could lift (X)HTML into DocBook, because operating on the DocBook output programmatically would allow some extremely powerful transformations that just aren't practical when working from the bottom up and trying to parse HTML in a piecemeal fashion.
Does anything exist library-wise that would allow me to do that sort of transformation, or is it the missing link standing between us and the hype about the semantic web becoming reality?