tags:

views:

83

answers:

2

I need to upload a bunch of files in a directory to S3. Since more than 90% of the time required to upload is spent waiting for the http request to finish, I want to execute several of them at once somehow.

Can Fibers help me with this at all? They are described as a way to solve this sort of problem, but I can't think of any way I can do any work while an http call blocks.

Any way I can solve this problem without threads?

+1  A: 

You could use separate processes for this instead of threads:

#!/usr/bin/env ruby

$stderr.sync = true

# Number of children to use for uploading
MAX_CHILDREN = 5

# Hash of PIDs for children that are working along with which file
# they're working on.
@child_pids = {}

# Keep track of uploads that failed
@failed_files = []

# Get the list of files to upload as arguments to the program
@files = ARGV


### Wait for a child to finish, adding the file to the list of those
### that failed if the child indicates there was a problem.
def wait_for_child
    $stderr.puts "    waiting for a child to finish..."
    pid, status = Process.waitpid2( 0 )
    file = @child_pids.delete( pid )
    @failed_files << file unless status.success?
end


### Here's where you'd put the particulars of what gets uploaded and
### how. I'm just sleeping for the file size in bytes * milliseconds
### to simulate the upload, then returning either +true+ or +false+
### based on a random factor.
def upload( file )
    bytes = File.size( file )
    sleep( bytes * 0.00001 )
    return rand( 100 ) > 5
end


### Start a child uploading the specified +file+.
def start_child( file )
    if pid = Process.fork
        $stderr.puts "%s: uploaded started by child %d" % [ file, pid ]
        @child_pids[ pid ] = file
    else
        if upload( file )
            $stderr.puts "%s: done." % [ file ]
            exit 0 # success
        else
            $stderr.puts "%s: failed." % [ file ]
            exit 255
        end
    end
end


until @files.empty?

    # If there are already the maximum number of children running, wait 
    # for one to finish
    wait_for_child() if @child_pids.length >= MAX_CHILDREN

    # Start a new child working on the next file
    start_child( @files.shift )

end


# Now we're just waiting on the final few uploads to finish
wait_for_child() until @child_pids.empty?

if @failed_files.empty?
    exit 0
else
    $stderr.puts "Some files failed to upload:",
        @failed_files.collect {|file| "  #{file}" }
    exit 255
end
mgranger
+2  A: 

I'm not up on fibers in 1.9, but regular Threads from 1.8.6 can solve this problem. Try using a Queue http://ruby-doc.org/stdlib/libdoc/thread/rdoc/classes/Queue.html

Looking at the example in the documentation, your consumer is the part that does the upload. It 'consumes' a URL and a file, and uploads the data. The producer is the part of your program that keeps working and finds new files to upload.

If you want to upload multiple files at once, simply launch a new Thread for each file:

t = Thread.new do
  upload_file(param1, param2)
end
@all_threads << t

Then, later on in your 'producer' code (which, remember, doesn't have to be in its own Thread, it could be the main program):

@all_threads.each do |t|
  t.join if t.alive?
end

The Queue can either be a @member_variable or a $global.

audiodude
Although my fiber questino seems to have gone unanswered
Sean Clark Hess