views:

34

answers:

1

I was thinking about creating a nice browsable HTML version of my manpages and it turns out that doclifter does just what I want via the manlifter program, since it can lift TROFF into DocBook.

However, it got me thinking that it would be quite useful to have a similar library that could lift (X)HTML into DocBook, because operating on the DocBook output programmatically would allow some extremely powerful transformations that just aren't practical when working from the bottom up and trying to parse HTML in a piecemeal fashion.

Does anything exist library-wise that would allow me to do that sort of transformation, or is it the missing link standing between us and the hype about the semantic web becoming reality?

A: 

I don't know about any ready solutions for doing this, but for "an elegant language that can parse XHTML as well as supporting the rules needed to do the lifting", I would suggest XSLT (version 1.0 or 2.0).

Jukka Matilainen
I'm aware of XSLT although the idea of processing XML with yet more XML was something I was hoping to avoid :)
cons
If you want to avoid the angle bracket tax, I guess DSSSL is still around.
Jukka Matilainen