tags:

views:

55

answers:

2

I'm using mechanize for scraping a website which works nicely, however since you can't tell from a link what kind of file it is linking to e.g. http://somesite.com/images.php?get=123 is it possible to download the header only?

I'm asking this because I'd like to decide based on the filetype if I will download it. Also it would then help deciding on a filename when downloading.

It doesn't have to use mechanize but is there any Rails way of doing this?

A: 

You can use curb

ruby-1.8.7-p174 > require 'rubygems'
 => true 
ruby-1.8.7-p174 > require 'curb'
 => true  
ruby-1.8.7-p174 > c = Curl::Easy.http_head('https://encrypted.google.com/images/logos/ssl_logo_lg.gif'){|easy| easy.follow_location = true}
ruby-1.8.7-p174 > c.perform
 => true
 => #<Curl::Easy https://encrypted.google.com/images/logos/ssl_logo&gt;
ruby-1.8.7-p174 > c.content_type
 => "image/gif" 
hellvinz
+1  A: 

This? http://ruby-doc.org/stdlib/libdoc/net/http/rdoc/classes/Net/HTTP.html#M000682

response = nil
Net::HTTP.start('some.www.server', 80) {|http|
    response = http.head('/index.html')
}
p response['content-type']
Nakilon