Our web analytics package includes detailed information about user's activity within a page, and we show (click/scroll/interaction) visualizations in an overlay atop the web page. Currently this is an IFrame containing a live rendering of the page.
Since pages change over time, older data no longer corresponds to the current layout of the page. We would like to run a spider to occasionally take snapshots of the pages, allowing us to maintain a record of interactions with various versions of the page.
We have a working implementation of this (Linux), but the snapshot process is a hideous Python/JavaScript/HTML hack which opens a Firefox window, screenshotting and scrolling and merging and saving to a file. This requires us to install the X stack on our normally headless servers, and takes over a minute per page.
We would prefer a headless implementation with performance closer to that of the rendering time in a regular web browser, but haven't found anything.
There's some movement towards building something using Mozilla source as a starting point, but that seems like overkill to me, as well as a maintenance nightmare if we try to keep it up to date.
Suggestions?