A year or two back, I had to generate pdfs from a C++/C# program. In the end I settled on launching Apache's Java FOP as a separate process to do the conversion. The experience with xsl-fo was not a pleasant one. At the time, there didn't appear to be a single tool that had implemented xsl-fo completely. Tools tended to pick a subset of the specification and hack away at that. Given the sprawling complexity of xsl-fo, I'm starting to wonder if there will ever be a full implementation.
FOP tended to be buggy and considerable time was spent working around issues. XSLT and XPaths were difficult to learn. It took a few weeks before I was seeing past the verbosity and could quickly get things done. I don't think I ever quite got my head around xsl-fo though. It makes the html and css model look like a child's toy. Luckily, the pdfs generate, and don't have too many problems. :-)
Anyway, the task at hand: generating pdfs from xhtml output from FCKEditor.
I just can't seem to find a library that will convert from HTML into anything XSL friendly.
Heh. Yeah, that's 'cos there isn't one, and probably won't be an html to xsl-fo converter that's any good. Such a converter has a few things against it: complexity of browsers and complexity of xsl-fo. For such a converter to deal with an average html document, it needs the guts of a web browser: the layout, css support probably even JavaScript. Then it has to take the rendered page, and figure out what xsl-fo is needed to get something which looks similar, and fits within the paged constraints of xsl-fo.
It's like the problem with making a word viewer: without reimplementing a lot of word, it sucks most of the time because it doesn't look the same.
So... what can you do? Well, having a small subset of html to work with is a good start. Hopefully the output from FCKEditor is xhtml, as getting html into xml is a world of pain in itself (which tidy can be useful for). Next, unless some poor soul has already made an FCKEditor xhtml -> xsl-fo xslt for your xsl-fo implementation, you'll have to make one. That involves learning xsl-fo, xslt and xpath. In my experience it'll take a few weeks and will be a cobbled together solution.
To get started with xsl-fo I found the following links useful:
So what's all this xsl-fo, xslt stuff and all the other things? The XSL-FO: Ready for Prime Time? lays it out as:
The Extensible Stylesheet Language Family (XSL) XSL is a family of recommendations for defining XML document transformation and presentation. It consists of three parts:
- XSL Transformations (XSLT), a language for transforming XML
- The XML Path Language (XPath), an expression language used by XSLT to access or refer to parts of an XML document. (XPath is also used by the XML Linking specification)
- XSL Formatting Objects (XSL-FO), an XML vocabulary for specifying formatting semantics
My advice? Run. Find another away. Find another solution. Generate LaTeX files, and convert them into pdfs. Generate something else. Make word documents and print them using PDFCreator. Generate images. Control Firefox to print pages as pdfs. Find away to avoid needing pdfs at all. Anything, as long as it isn't fighting html, xsl-fo, FOP, xslt and xpath.
PS: Let me know if you need any help. :-)