views:

1843

answers:

18

I have an idea for a combination book and web app to help newbies learn computer programming from the ground up, and while I'm working on the web app side of it, I've been thinking in the back of my mind about the best way to do the book. Can anyone share their experiences with similar projects? Here are my requirements:

  • Source must be plain text files in some markup language. (I want to be able to use version control, arbitrarly choose my editor, etc.)
  • Must compile or build to produce pretty PDF output (on par with LaTeX) and HTML
  • Include custom HTML/JavaScript macros in the HTML edition. For example, I want an "open this in my workspace" link on each sample code block in the HTML edition of the book but not the PDF edition. I might also want to include links to create/edit a forum thread at the bottom of each section in the HTML, including text like "There are (x) comments about this section in the Forum." (Clearly this will involve some web server-side code.)

So far it looks like my best bet is LaTeX source, pdflatex output for PDF, and some custom hacking of latex2html to produce the HTML output I want. My past experience with LaTeX tells me it's very powerful but can sometimes be klunky and awkward.

lout looks very appealing but doesn't seem to have very good built-in or third-party support for HTML.

And the last option I was thinking of would be to write the book in a non-presentation format such as Zim Desktop Wiki (my favorite outliner/note taker app) and write scripts to translate this source to HTML and to lout.

Do any of these ideas sound particularly good or particularly idiotic? Who has built a technical book and targeted more than one output format?

+7  A: 

Instead of latex2html I would suggest HEVEA which I found is a lot easier to customize. In particular, it's rather easy to override the behavior of certain environments and commands for the HTML output.

I believe this is the easiest solution. When I looked into similar things a while ago, I found various Markdown-like things most appealing for the authoring itself, but you'd have a lot of work to do if you wanted to build processing for all the other stuff on top of a simple markup language.

Jan Krüger
+15  A: 

Another option to consider would be DocBook, which is an XML format designed specifically for authoring books. There are plenty of tools for rendering the book as HTML, PDF or other formats (RTF, etc.).

Dan Dyer
Hah, you beat me to my answer, nice. :-)
Chris Jester-Young
+4  A: 

A widely used "non-presentation format" (as you called it) is DocBook; it's designed for writing books of all sorts. There are already lots of tools (many of them provided by various GNU/Linux distributions out of the box) for converting DocBook documents to PDF and HTML. Hope it works for you!

Chris Jester-Young
+4  A: 

You might want to take a look at what he Subversion folk did with "Version Control with Subversion" (http://svnbook.red-bean.com/). They build HTML/PDF versions from the same source using DocBook (XML based).

adrianh
+6  A: 

LaTeX is the gold standard. There's a couple of new alternatives however; I'd take a look at reStructuredText. I was thinking about Idiopidae also, but now that I look at it, I see that it's more about separating code from LaTeX markup, rather than a new markup language (still, may come in useful).

Depending on how much formatting and so on you need, you might even get away with just using Markdown or Textile - the good thing with those is that they're simple enough that you can extend them to support your custom use cases.

Aeon
+6  A: 

I studied this topic in order to be able to write our own software documentation using something else than a word processor, like you, for version control reasons.

During my research, I discovered Prince XML, a CSS renderer that understands CSS 2 print properties and allows you to format an HTML file into a book. This tool is actually created by the developer implementing CSS into the Opera browser.

There are some very nice examples on the website where you can compare the HTML and the PDF output from Prince.

The cool thing is that you only have to maintain your content in an HTML file and then use CSS to format the document for browsers and PDF rendering. You can even share the stylesheets by using the @media declaration in your CSS file. There is more information about how to create a PDF on the website.

The tool is not free for professional use but there is a free personal license.

Vincent Robert
+2  A: 

Princexml is the best tool i've found for repurposing HTML to PDF. Supports features such as page numbering, headers/footers, TOC, excellent CSS support. The output from HTML is outstanding and saves you the hassle of mucking about with latex.

+4  A: 

Second the notion not to use latex2html, which is these days very long in the tooth. While I've heard some good things about the aforementioned Hevea, I'd strongly recommend looking at TeX4ht as well, which is:

A converter from TeX and LaTeX to SGML-based formats such as (X)HTML, MathML, OpenDocument, and DocBook, providing a configurable (La)TeX-based authoring system for hypertext.

Unless you're somehow going to be able to pipe your text into InDesign, you simply won't be able to get the PDF output quality that you want without having a LaTeX step in there somewhere.

Will Robertson
Ah... TeX4ht was the other LaTeX -> HTML compiler/convertor I was trying to recall when I posted this question. Thanks!
Brendan Kidwell
+4  A: 

I second the recommendation for DocBook. I would start with the following three resources:

  • DocBook How-to (docbook.org/docs/howto): Describes the latest version of DocBook
  • DocBook: The Definitive Guide (docbook.org/tdg5/en/html/docbook.html ): Description of the standard. Print version is not up-to-date with the current version of the standard, but there is a draft version on line that is close, though still being reviewed.
  • DocBook XSL: The Complete Guide (sagehill.net): This book describes the DocBook XSL stylesheets.

All three references are freely available on line. DocBook XSL is also available in print and well worth it if you do much with the stylesheets.

Dick
+11  A: 

I'm currently playing around with MultiMarkdown, and I think it does what you need I think.

You've probably heard of Markdown? In fact, you've used it already for writing StackOverflow comments :)

MultiMarkdown is an extension of Markdown. It's designed for book and article production, so is suited to both web and PDF/print. It can handle cross-references, table of contents and indexes etc. The generated PDF's (via LaTex) include table of contents, page numbers, cross references, auto-references, other front-matter, parts, chapters, sections, appendixes and index.

You're writing workflow is something like this:

  • Write your content in plain text files, using MultiMarkdown syntax, and your favourite text editor. Use SVN or GIT for change tracking.
  • Run the multimarkdown2XHTML.pl perl script to convert the files to XHTML. You also can convert to RTF, Plain Text or LaTeX (and then PDF)
  • note If you're on OSX, you can also use the Scrivener app as an authoring environment. It natively supports exporting to MultiMarkdown.
  • Use a LaTex PDF converter to produce final PDF from LaTex.

This might sound involved, but I can tweak a text file, and have a gorgeous generated PDF and HTML pages in about 10 seconds :)

You can also embed HTML in your MultiMarkdown files. Similarly, it's open to customisation.

One way is to modify the perl scripts to support any additional features you need. Easier still, you can also modify the XSLT transforms that are responsible for generating the XHTML, LaTex or RTF files.

So far it's proving to be very flexible. You can download MultiMarkdown from the MultiMarkdown site. You'll need Perl too. And, if you want PDF then you'll need to install LaTeX (I downloaded MaxTex on OXS). With all these things installed, you're good to go.

Tobin Harris
It's flexible, and is good to know about (so +1), but it's a rambling mess.
Charles Stewart
+4  A: 

Pandoc may do what you need as well.

Jared Updike
+1  A: 

A great and simple to use alternative that I have used in this arena is Innovasys Helpstudio. With it you author documents in a nice WYSIWYG environment or import Word documents and then can output them as PDF, HTML and various windows help formats. You can index and reference documents and use various templates for the output. It has served me well.

Steve Massing
+2  A: 

There's a whole blog devoted to XML, PDF, HTML, etc. in publishing. The Fastware Project is Scott Meyers' musing on how to handle all this.

John D. Cook
+2  A: 

You might also want to look into LyX. It's cross-platform and it provides many of the capabilities of LaTeX, but with a useful GUI. Exporting to PDF and HTML is as simple as selecting the options from the menus.

Sean
+5  A: 

This is a copy of this answer.

After many years of anguish and several false starts, I'm about to revisit this, and I'm going to give Sphinx a try. It can generate HTML or LaTeX from ReStructured Text.

I'm hoping it will be a much "lighter" option than full DocBook, but with many of the advantages.

Brent.Longborough
I'm using Sphinx, not for a book yet but for other papers, and I love it and everyone who sees the PDF output says "Wow! How did you get Word to do that?" ;-)
jdkoftinoff
+1  A: 

I would consider DocBook, as recommended above, and the Oxygen editor for WYSIWYG editing to do tweaking (noting your comment on wanting your choice of editor for the main authoring).

I'm a big fan of Oxygen as a general XML editor.

Andy Dent
+2  A: 

Although slightly dated, the post on this topic by Martin Fowler is a good companion reference to go with the recommended product links above.

Chris Melinn
A: 

Two suggestions:

  1. AMRITA is a latex-based document preparation system designed for literate programming, and for producing javascript-capable HTMl outfit. Its goals sounds like a good fit for you, but I have no experience with it.
  2. Texinfo has become, since the merge of texi2html into makeinfo, not bad at generating html: you can use css to govern layout and texinfo has an @html directive to allow html to be directly embedded into output html. You can coax the tex output into making fairly attractive PDFs.
Charles Stewart