wget

How can I create a site in php and have it generate a static version?

For a particular project I have, no server side code is allowed. How can I create the web site in php (with includes, conditionals, etc) and then have that converted into a static html site that I can give to the client? Update: Thanks to everyone who suggested wget. That's what I used. I should have specified that I was on a PC, so I g...

is there a curl/wget option that says not to save files upon http errors?

I want to download a lot of urls in a script but I do not want to save the ones that lead to HTTP errors. as far as I read in the man they both don't provide such functionality. does anyone know about another downloader who does? ...

Slow wget speeds when connecting to https pages

I'm using wget to connect to a secure site like this: wget -nc -i inputFile where inputeFile consists of URLs like this: https://clientWebsite.com/TheirPageName.asp?orderValue=1.00&merchantID=36&programmeID=92&ref=foo&Ofaz=0 This page returns a small gif file. For some reason, this is taking around 2.5 minutes. When...

Scrape multi-frame website

I'm auditing our existing web application, which makes heavy use of HTML frames. I would like to download all of the HTML in each frame, is there a method of doing this with wget or a little bit of scripting? ...

How to make Wget handle an HTTP 100-Continue response?

I am trying to POST a HTML (contained in a file) to a URL using Wget like this: wget -O- --debug --header=Content-Type:text/html --post-file=index.html http://localhost/www/encoder.ashx The URL to which the HTML is being posted is a Web application end-point implemented using ASP.NET. The server replies with a 100 (Cont...

Using wget to recursively fetch a directory with arbitrary files in it

I have a web directory where I store some config files. I'd like to use wget to pull those files down and maintain their current structure. For instance, the remote directory looks like: http://mysite.com/configs/.vim/ .vim holds multiple files and directories. I want to replicate that on the client using wget. Can't seem to find the ...

parse http response header from wget

Im trying to extract a line from wget's result but having trouble with it. This is my wget call: $ wget -SO- -T 1 -t 1 http://myurl.com:15000/myhtml.html --18:24:12-- http://xxx.xxxx.xxxx:15000/myhtml.html => `-' Resolving xxx.xxxx.xxxx... xxx.xxxx.xxxx Connecting to xxx.xxxx.xxxx|xxx.xxxx.xxxx|:15000... connected. HTTP re...

Programmatically log on to forum and then screenscrape

I'd like to login to the Forums part of community-server (e.g. http://tinyurl.com/cs-login) and then download a specific page and perform a regex (to see if there are any posts waiting for moderation). If there is, I'd like to send an email. I'd like to do this from a Linux server. Currently I know how to download a page (using e.g. w...

"wget --domains" not helping.. what am I doing wrong?

Hi there, I'm attempting to use wget to recursively grab only the .jpg files from a particular website, with a view to creating an amusing screensaver for myself. Not such a lofty goal really. The problem is that the pictures are hosted elsewhere (mfrost.typepad.com), not on the main domain of the website (www.cuteoverload.com). I hav...

how reliable would it be to download over a 100,000 files via wget from a bash file over ssh?

I have a bash file that contains wget commands to download over 100,000 files totaling around 20gb of data. The bash file looks something like: wget http://something.com/path/to/file.data wget http://something.com/path/to/file2.data wget http://something.com/path/to/file3.data wget http://something.com/path/to/file4.data And there ...

How do I completely mirror a web page?

I have several web pages on several different sites that I want to mirror completely. This means that I will need images, CSS, etc, and the links need to be converted. This functionality would be similar to using Firefox to "Save Page As" and selecting "Web Page, complete". I'd like to name the files and corresponding directories as s...

How can I set a temp directory for uncompleted downloads in Wget?

I'm trying to mirror files on FTP server. Those files can be very large so downloads might be interrupted. I'd like to keep the original files while downloading partial files to a temporary folder and once completed override local older versions. Can I do this? how? Is there another easy to use (command line) tool that I can use? ...

Why isn't wget accepting my username/password?

I've tried both wget --user=myuser --password=mypassword myfile and wget --ftp-user=myuser --ftp-password=mypassword myfile but I keep getting the error HTTP request sent, awaiting response... 401 Authorization Required Authorization failed. I know the file is there, and I know the username/password are correct - I can ftp in wi...

wget using username other than root

I am trying to get a jar file under this path /usr/test/ but i only have a user id other than root. So after I issue wget ftp://mike:[email protected]:/usr/test/getme.txt the code return: TYPE I ... done. ==> CWD 'mike.'/usr/test/... No such directory `usr/test/'. I believe the problem is after I ftp in as mike, by default I am in ...

wget errors breaks shell script - how to prevent that?

I have a huge file with lots of links to files of various types to download. Each line is one download command like: wget 'URL1' wget 'URL2' ... and there are thousands of those. Unfortunately some URLs look really ugly, like for example: http://www.cepa.org.gh/archives/research-working-papers/WTO4%20(1)-charles.doc It opens OK in a...

How to download fast lots of web pages in ruby? Parallelizing download?

I need to scrape(using scrAPI) 400+ web pages ruby, my actual code is very sequential: data = urls.map {|url| scraper.scrape url } Actually the code is a bit different (exception handling and stuff). How can I make it faster? How can I parallelize the downloads? ...

Possible to assign a new IP address on every http request?

Hi Is it possible for me to change or assign my server a new IP address every time it needs to make a http request with commands such as wget? Thanks all Update The reason for this is exactly what the Tor project is trying to achieve. I do not want to leave a trace of what requests my server makes and I thought constantly changing my...

wget -k converts files differently on Windows and Linux

I've got GNU Wget 1.10.2 for windows and linux and the -k option behaves differently on those two. -k, --convert-links make links in downloaded HTML point to local files. On windows it produces: www.example.com/index.html www.example.com/index.html@page=about www.example.com/index.html@page=contact www.example.com/index.ht...

Only create file if http status 200 with wget?

I have been trying to figure out a way to make wget only create a file if the actual download response is valid, meaning no 404 or 500 status code, only 200. However, when using the -O option (to specify filename) it will always create the file, with the content of the error page, and I haven't found a way to specify that it should igno...

How do web spiders differ from Wget's spider?

The next sentence caught my eye in Wget's manual wget --spider --force-html -i bookmarks.html This feature needs much more work for Wget to get close to the functionality of real web spiders. I find the following lines of code relevant for the spider option in wget. src/ftp.c 780: /* If we're in spider mode, don't really retrie...