views:

45

answers:

1

I am wanting to find an automated way to download an entire website page (not the entire site just a single page) and all elements on the page, then sum the size of these files.

When I say files, I would like to know the total size of HTML, CSS, Images, local and remote JS files, and any CSS background images. Basically the entire page-weight for a given page.

I thought about using CURL but was not sure how to enable it to grab remote and local JS files as well as images referenced in the CSS files.

+1  A: 

Try wget:

  • make it download all required files with -p or --page-requisites option
  • download scripts and images local to the site and not further than 2 hops away (this should get local images and code) with -l 2 for --level=2
  • and change the code files to link to your local files instead of their original path with -k for --convert-links:
    wget -p -l 2 -k http://full_url/to/page.html
honk
Close, but it does not seem to be grabbing the remote JS files, such as Google analytics.
meme
@meme Try adding `-r -l 2`. That's in the man page around l.1320
honk
this only seems to work for links in the local site, not remote urls
honk
Confirmed it works for the most part but will not grab the remote JS files such as Google Analytics and Flash files, but it is close.
meme