You need to consider what you want to put in the PDF: Do you want an exact replica of the web page the user is viewing (too hard to be close to impossible without having the user's browser installed on your side and using its print output) or do you want the same information in a roughly similar layout?
An important issue is how you are generating the HTML: I did something similar once to generate PDF receipts for experiment participants (now, I just output HTML with print styles).
The HTML is generated using HTML::Template although Template.pm would be just as fine.
It is then trivial to write another template, one that generates a LATEX document which can be processed using pdflatex. If you save the data the time the snapshot is requested, you can add the snapshot to a queue that generates documents asynchronously so that requests do not tie up the web server.
Update: Looking at rrdcgi, I now realize that it already does use a template. That is perfect: Instead of putting HTML in the template, put LATEX code in the template and run rrdcgi
with the --filter
option to create a LATEX source file which you can run through pdflatex
. I guess the problem to solve there is to be able to use the exact same data that was used to generate the page the user is looking at.
If it is not possible to re-run rrdcgi
with the exact same data, consider adding some JavaScript that submits the HTML source of the page the user is reviewing (or some JSON representation thereof) to a CGI script that parses the HTML and outputs LATEX. Writing clean HTML in the original template and judicious use of class
and id
attributes would help there.
I do not have time to test any of these ideas right now, but I will take a look again within the next couple of days.