tags:

views:

431

answers:

4

I'm going to create a converter from html to some format. I'm thinking to use intermediate format XML(XSL-FO).

My question: Why is FO format popular if not so many applications render it?

A: 

Both RenderX and Antenna House make excellent XSLFO->PDF renderers. There is also the free [Apache FOP] renderer, which is good enough for many projects. It takes a serious effort in order to fully support FO in all its gory details; perhaps the bar to entry is too high based on the size of the market, the established players, and the potential return.

Not that you asked, but before you do too much work on FO to HTML, there are a couple of free choices which might save you some effort.

lavinio
I don't think he was asking for tool recommendations
skaffman
@skaffman perhaps not, but seeing how others have done it might help him. Just being a good neighbor. :)
lavinio
Constantine
+1  A: 

I am well aware that there's a big debate going on between CSS and XSL-FO supporters, and both sides have valid and good points.

Here's the best brief argument for XSL-FO that I've seen so far:

XSL-FO provides a more sophisticated visual layout model than HTML+CSS. Formatting supported by XSL-FO, but not supported by HTML+CSS, includes right-to-left and top-to-bottom text, footnotes, margin notes, page numbers in cross-references, and more. In particular, while CSS (Cascading Style Sheets) is primarily intended for use on the Web, XSL-FO is designed for broader use. You should, for instance, be able to write an XSL style sheet that uses formatting objects to lay out an entire printed book. A different style sheet should be able to transform the same XML document into a Web site.

(Source: http://www.cafeconleche.org/books/bible2/chapters/ch18.html)

Here are some arguing XSL-FO is superior:

While here some say CSS is better:

marc_s
A: 

XSL-FO is a common standard that tool implementers can follow to ensure compatibility, in the same way that HTML is the common standard for web pages, XSL is the standard for XML-to-* translation, etc.

If your HTML can be considered well-formed XML (ie: short tags are closed properly such as <br /> and <img /> - not <br> and <img>) then you should be able to use XSL to translate that directly to XSL-FO, which you can then pass directly to a tool like Apache FOP for conversion. If it's not well-formed, you can always use a tool like Python's BeautifulSoup or PHP's DOMDocument::loadHTML() to load the HTML and output well-formed XHTML for your conversion.

There are also tools like FPDF (PHP) and Prawn (Ruby, which was used for the very pretty Dopplr reports), but IMHO they're much more "fiddly" to use - more like using absolute positioning in CSS than allowing things to flow by themselves. Can cause problems when you consider page breaks and such.

However, this all depends on what you're doing with the output.

digitala
A: 

Did you check the Ecrion XSL-FO engine? By boss chose it because it supports a lot of output formats (PDF, Word, PowerPoint, Postscript, HTML and a very cool Silverlight output mode) and it has an incredible designer (which to my knowledge is the only one that can work with other formatting engines, including FOP). The problem with using HTML to generate PDF is that:

  • HTML doesn't let you control the pagination (like have different layouts on even/odd pages, just like a book or a catalog).
  • You can't control headers and footers.
  • There are no instructions for creating page number citations, footnotes, insert the page count, and countless of other things a printable publication may have.

Advanced engines like Ecrion, Antenna and Renderx are also able to generate high quality PDF output (like PDF/A for archiving or PDF/X for printing).

XMLDUDE