views:

301

answers:

3

I've tried numerous ways of downloading files, specifically .zip and .tar.gz, with Ruby and write them to the disk.

I've found that the file appears to be the same as the reference (in size), but the archives refuse to extract. What I'm attempting now is:

Thanks!

def download_request(url, filePath:path, progressIndicator:progressBar) 

file = File.open(path, "w+")
    begin
      Net::HTTP.get_response URI.parse(url) do |response|

        if response['Location']!=nil
          puts 'Direct to: ' + response['Location']
          return download_request(response['Location'], filePath:path, progressIndicator:progressBar)
        end

        # some stuff

        response.read_body do |segment|
          file.write(segment) 
          # some progress stuff.
        end
      end
    ensure
      file.close
    end
end

download_request("http://github.com/jashkenas/coffee-script/tarball/master", filePath:"tarball.tar.gz", progressIndicator:nil)
+1  A: 

I'd recommend using open-uri in ruby's stdlib.

require 'open-uri'

open(out_file, 'w') do |out|
  out.write(open(url).read)
end

http://ruby-doc.org/stdlib/libdoc/open-uri/rdoc/classes/OpenURI/OpenRead.html#M000832

Make sure you look at the :progress_proc option to open as it looks like you want a progress hook.

It's a nice simple solution, and I've tried open-uri, and I still get a corrupted file.
arbales
Strange. I have a system using this at the moment that downloads all of the National Library of Medicine zip files (in parallel using resque) made up of >700 files, each of which is tens of Mb.I would get them with curl or wget from a shell and ensure that there isn't a fundamental issue with the archives.
I did, when I download them with Curl, wget, or Safari, the archived expand just fine, but when I try anything in my app the files refuse to expand – they often create these empty .cpg or something files.
arbales
A: 

The last time I got currupted files with Ruby was when I forgot to call file.binmode right after File.open. Took me hours to find out what was wrong. Does it help with your issue?

Marcel J.
Thanks, but the documentation states that this is only useful in MSDOS, I'm running Mac OS, so it shouldn't be a problem.
arbales
+2  A: 

I've successfully downloaded and extracted GZip files with this code:

require 'open-uri'
require 'zlib'

open('tarball.tar', 'w') do |local_file|
  open('http://github.com/jashkenas/coffee-script/tarball/master/tarball.tar.gz') do |remote_file|
    local_file.write(Zlib::GzipReader.new(remote_file).read)
  end
end
Theo
The .tar.gz file apparently isn't working. "mrb[2877:a0f] not in gzip format"
arbales
I changed the code to include the URL you had in your example, and it runs fine in Ruby 1.8.7, 1.9.1, Rubinius 1.0.0 RC2 and JRuby 1.4.0. However, when run in MacRuby 0.5 it saves a garbled file. I don't know why, but it's definitely a bug in MacRuby.
Theo
I've also tried rewriting the code using HTTParty, with the same result (but HTTParty probably uses the same mechanisms as open-uri, so that's no surprise).
Theo
:-( I was afraid it was a MacRuby bug. Now I have to figure out some weird multi-step Cocoa way to download.
arbales
@theo, I can't vote up your answer for some reason, could you edit it so I can?
arbales
not sure why you can't upvote it, I haven't done anything as far as I can see
Theo
I tried to upvote after I already did, and it removed my vote, now I cant reup.
arbales
if you are using MacRuby, you can use NSTask to use curl to get the file as a workaround
falconcreek
Example here - http://macruby.labs.oreilly.com/ch03.html#_tasks_subprocesses
falconcreek