views:

310

answers:

4

I've been a big fan of MediaWiki and similar wiki-based text editors. I like the ability to quickly add text, collaborate, and share. However, there's always still the need for nicely formatted print output. Things like headers and footers (that say what I want them to say), page breaks, margins, etc.

Most solutions I've seen involve some sort of conversion to a intermediate print-media format (maybe MediaWiki to Microsoft Word or maybe some custom scripting that generates a PDF from the contents of a web page (with a lot of hard-coded references).

Is there any more generic solution that exists for this problem? Any framework that seeks to merge HTML and web content in general into a print media output format?

Any solutions, discussion regarding the pro's or con's, or whatever is welcome.

Thanks!

Update: I think CSS will only get me so far though... I've used CSS for similar type output (MediaWiki by default has a print format that hides much of the nav bar stuff). Think of a MediaWiki article though -- imagine me being able to tweak a tag in the content or something similar and now my margin is 1 inch instead of .5 inches. That's more along the lines of what I'm aiming for.

+2  A: 

Using print CSS files is a really slick approach to reformatting pages for printing.

A lot of people fall back to PDF because it can be more powerful and easier.

For most things, though, I think CSS markup is simpler and easier.

Look at the source for pages in StackOverflow and you'll see references to media="print" (print.css)--a set of styles applied only when a browser prints the page.

<link href="/Content/print.css" rel="stylesheet" media="print" type="text/css" />

You can use these to hide navbars, ads (or show different ads). Do some basic pagination, etc.

If you need more control over things like margins, you have to go outside the browser (PDF, Word, XPS, etc.).

Michael Haren
+1  A: 

I've written a MediaWiki to LaTeX converter that tries to maintain the document structure of the source text. The document is then typeset with pdflatex to produce a very high quality, paginated document. Math markup is directly rendered by LaTeX, so the equations look great. The LaTeX documentclass / stylesheet is configurable from specialized commands in the wiki to directly control margins, page layout, fonts, extra packages and so on. This would fall in your second category of a custom script rather than a generic framework.

There are many others, such as the Extension:Pdf_Export that uses htmldoc. While it is more general, it does a very poor job of pagination and creates lots of widows and orphans, doesn't do optimal text justification and doesn't do indexes, figures, self-references, etc. Additionally, if you use <math> markup in MediaWiki it only includes the low-res PNG files.

princexml is specialized for MediaWiki and produces good looking documents, but isn't available under a Free license. Since it is a closed-source product, your ability to control the output is limited.

Hudson
+4  A: 

http://www.princexml.com/

could be something for you. It converts xml and html pages to pdf documents.

Stevens
+1 prince is absolutely amazing and worth the price! The quality and ease of xhtml to pdf conversion is stunning!
tharkun
+3  A: 

You may have heard of PediaPress, a company that has done a "wiki to print" (i.e PDF, but also ODF) deal with the Wikimedia Foundation. (See "Wikis Go Printable".) Their code is designed to work with MediaWiki and is open source.

But! It's even better than that. Check out this bookmarklet. You can use it to create PDFs or ODFs of any publicly-accessible MediaWiki page (maybe it needs the API to be enabled too...). And you can bundle multiple pages, from a single MediaWiki or multiple MediaWikis, into a single document. It's pretty freaking awesome in my book. :)

ETA: PediaPress have put significant work into making something that looks really nice to read. It's not just the equivalent of MediaWiki's printable version converted to PDF.

pfctdayelise
Thanks -- I hadn't heard of them and that's really neat.
Andrew Flanagan