views:

2379

answers:

5

I need a tool to automatically convert simple HTML into an image. I will be controlling the HTML input which will consist of simple text formatting tags and possibly image links--I don't need to be able to render arbitrary HTML. Is there a simple way to do this?

I've looked at the HTML layout engines like Gecko and Webkit, but frankly I'm overwhelmed by the number of options they have--I don't need a complete web browser! Is it possible to use these engines in this way? Can someone steer me in the right direction?

Other possibilities like browsershots, rely on screenshots of real browsers, but I'm going to be running this application on a web server with potentially many users so performance is important and I'm afraid this kind of solution won't scale.

Ideas?

EDIT: Sorry forget to mention that my server is running Linux, so Windows solutions won't help. :)

A: 

Perhaps you can convert the HTML to another format which is more readily convertable to an image? In Google I found something called html2ps and html2pdf. From PS it's just one step away to EPS, and that can be rendered as an image already. Or something like that.

Vilx-
This is a good suggestion. Ghostscript for example can convert to various image formats from a ps source file, which can in turn be created by html2ps
Jay
+1  A: 

Windows? If yes, then HTMLayout may be able to help - it's a free rendering engine and it has a simple API - using it from C/C++ is a breeze - getting HTML into a BMP wouldn't be hard.

http://www.terrainformatica.com/htmlayout/

It's free too.

Rob
+4  A: 

You may find this useful, if you are running on Linux and have the KDE libs available: khtml2png

khtml2png is a command line program to create screenshots of webpages. It uses libkhtml (the library that is used in the KDE webbrowser Konqueror). In khtml2png 2.0.5 to 2.5.0 "convert" from the ImageMagick graphic conversion toolkit is used to create the output files in various image file formats. 2.6.0 and future development will use the built-in conversion of the Qt library.

Also, to follow up on what Vilx suggested, you could use html2ps to convert HTML to a ps file, then gs (Ghostscript) to turn the ps file into a png or jpg. See http://www.karakas-online.de/myLinuxTips/ps2png.html for one approach.

Jay
A: 

You cn use the PDFCreator application. This application allows printing to many formats including images of all kind. It includes an ActiveX / com server which allows you to automate the process fairly easily. You can convert pretty much any thing you can print. One draw back of this method is that since it uses the printing frame work for conversion you can convert only one document at a time, so I don't know if it will good enough for a website.

Alex Shnayder
Note the submitter is running on Linux, so a Windows (ActiveX/com) solution is not useful in this case.
Jay
+4  A: 

Answering, my own question I found this useful tool which uses WebKit to render a page and then captures the output as an image or even in PDF format!

http://cutycapt.sourceforge.net/

The idea is similar to khtml2png mentioned by Jay, but I liked this implementation better. Also, for future reference, running an X virtual frame-buffer through Xvfb is not nearly as memory intensive as I had feared.