views:

311

answers:

4

I need to convert a chm file to another format, most likely pdf or html. I have tried chm2pdf and other converters but they all do horrific jobs at conversion. Even using a program like htmldoc doesn't do a very good job at converting to html.

Is there a way to just print each page of the chm file or taking a image of it and then saving that image to the pdf or other format.

The main reason for this is that chm books are filled with code and conversion tools don't format anything correctly.

+1  A: 

Install something like Primo PDF which installs a virtual printer and spits out a PDF file. Afterwards, print out the book chapter by chapter (as described here - http://www.helixoft.com/vsdocman-faqs/printing-chm-documentation.html)

naivists
+1  A: 

If you just need to read it on linux, xchm (http://xchm.sourceforge.net/) provides decent native support for viewing .chm files.

mothis
There are so many there, Firefox has an extension, GNOCHM, KCHMViewer. Tools like CHMLIB have read support, and Free Pascal's libs allows CHM write support on all OSes and architectures it supports.
Marco van de Voort
+1  A: 

I'm afraid that will be a multi-step procedure...

  1. Extract the pages from the CHM file, e.g. using arCHMage.
  2. Use wkhtmltopdf on each page.
  3. Use something like pdfjoin (from pdfjam) to tape the documents together.

This complements the answer recommending a virtual to-PDF printer in that it's the more linuxy command line solution (all of the tools mentioned are available in Debian's and Ubuntu's package repository). Pick your poison. ;)

Jan Krüger
+1  A: 

The problem is that the windows CHM viewer is basically MSIE (Internet Explorer). The exact rendering is probably version dependant. (and for the average file you'd probably want MSIE 6)

In other words to get a faithful reproduction is to use some extracter (I use the one from CHMLIB or, lately, Free Pascal/Lazarus) to decompress the CHM (which is just an html archive with additional indexes), and fire up MSIE for each page, and instrumenting it to write to e.g. a virtual PDF writer.

This way you have some chance to really capture it the way IE renders it. (and hope it doesn't render differently to printer than to screen).

The TOC is in XML form in the .hhc files, and you could transform one of them to the PDF bookmarks treeview.

If you somehow get the number of "pages" back from IE, you could probably even transform the index files to something you could add to the PDF, since you could make out on which page every topic is. But that is for the advanced class :-)

Marco van de Voort