views:

30

answers:

1

Hi, I am using Amazon S3 service to upload different directories (and the files inside) to different buckets (directory -> bucket). I am coding in Ruby, and I am using the lib http://amazon.rubyforge.org.

The files are small (about 20KB).

I'd like to upload the directories in parallel (using many threads) but I have to use synchronize around the S3Object.store

@mutex.synchronize do
  S3Object.store(s3_obj_name, open(image_name), bucket_name)
end

If I don't use synchronize I obtain Net::HTTPBadResponse exception ! So, with synchronize, I lost the advantages of using parallel programming.

Do you have some ideas about how to succeed in the parallel uploading ?

Thank you, Alessandro DS

A: 

It appears that the ruby s3 library you're using isn't thread safe: http://rubyforge.org/tracker/index.php?func=detail&aid=8162&group_id=2409&atid=9357

So your options include:

  • Write a patch for that library to make it thread safe (I'm not a ruby guy, not sure how difficult that would be to do)
  • Find another S3 ruby library that is thread safe (I googled it and didn't see anything obvious)
  • Write a short ruby script that does a single S3Object.store call, and exec that from your parent ruby script. Then each store() call will be in a separate process and the thread safety issue won't bite you

Those options assume you want to stick with ruby. Hope that helps.

James Cooper