views:

125

answers:

2

I need to make snapshots of web pages programmatically using PHP and get them into a HTML E-Mail.

I tried wget --page-requisites. It downloads everything all right, but it doesn't change the HTML page's source code to point to the downloaded files rather than the on-line originals. Also, that HTML is of course a long way from being displayed properly in a HTML E-Mail.

I am interested to know whether there are ready-made solutions for this. I would already be happy with a solution that takes a HTML snapshot and changes the HTML accordingly. Being able to E-Mail it would be the icing on the cake.

I control the web pages being snapshot, so I have the possibility to adjust the content to optimize the results.

My server-side platform is PHP but with very liberal settings, I can execute things like wget and Perl scripts from within PHP. I do however not have root access and can not install additional packages or programs.

The task is to make a snapshot of a product page each time somebody places an order, so there is documentation about what the page looked like at the time.

+1  A: 

In this case, you try to do a website mirroring using wget. The simple solution is to use httrack which is a simple command-line tool. It's very powerful and configurable, try it! The httrack website presents a GUI, but you don't need it, all is possible from the command-line (or from PHP).

shad
Unfortunately, I can not install additional programs on the server (it's a Linux webserver). But if anybody could point me towards a standard Linux tool that can do the same as httrack, this might help me.
Pekka
Even if you compile httrack in your directory ? Or you could simply deploy httrack with your php app (in a subdirectory).
shad
+2  A: 

wget has a -k (--convert-links) option, which will convert both links and references to embedded content (like images). See e.g. wget advanced use (also here).

For the email-part of your question - I'm sure you can use one of the existing libraries. For example, PHP has some PEAR package (do no remember the exact name) to handle HTML emails; I'm pretty sure both Perl and Python have something similar.

chronos