ansaurus

Question

Answer 1

A:

If you give wget several addresses at once, with consecutive addresses belonging to the same HTTP/1.1 (Connection: keep-alive) supporting server, wget will re-use the already-established connection.

If there are too many addresses to list on the command line, you can write them to a file and use the -i/--input-file= option (and, per UNIX tradition, -i-/--input-file=- reads standard input).

There is, however, no way to preserve a connection across different wget invocations.

ephemient 2009-09-15 17:46:29

Answer 2

A:

You could also write a threaded Ruby script to run wget on multiple input files simultaneously to speed the process up. So if you have 5 files containing 10,000 addresses each, you could use this script:

#!/usr/bin/ruby

threads = []

for file in ARGV
  threads << Thread.new(file) do |filename|
    system("wget -i #{filename}")
  end
end

threads.each { |thrd| thrd.join }

Each of these threads would use one connection to download all addresses in a file. The following command then means only 5 connections to the server to download all 50,000 files.

./fetch.rb "list1.txt" "list2.txt" "list3.txt" "list4.txt" "list5.txt"

Druid 2009-09-15 18:53:56

Umm, that's only 5 connections at a time, but you're establishing a total of 50,000 connections.

ephemient 2009-09-15 20:31:10

Answer 3

A:

You could also write a small program (in Java or C or whatever) that sends the list of files as a POST request and the server returns an object with data about them. Shouldn't be too slow either.

Shade 2010-06-28 12:40:04

ansaurus

tags:

views:

answers:

Threaded wget - minimalizing resources

related questions