tags:

views:

286

answers:

3

How do I get Ruby's Net::HTTP module to cancel a request?

The call to http.finish below raises an error. I get the impression that the response object doesn't know that the connection was closed and still expects more data.

EDIT: I want to avoid making a HEAD request. So, make a get request, unless content-type is html, cancel the request.

Thanks!

Net::HTTP.start(uri.host, uri.port) do |http|
  http.request_get(uri.path) do |response|
    unless response['content-type'] =~ /html/i
      http.finish
    end
  end
end

/usr/lib/ruby/1.8/net/http.rb:2241:in `stream_check': attempt to read body out of block (IOError)
    from /usr/lib/ruby/1.8/net/http.rb:2171:in `read_body'
    from /usr/lib/ruby/1.8/net/http.rb:2198:in `body'
    from /usr/lib/ruby/1.8/net/http.rb:2137:in `reading_body'
    from /usr/lib/ruby/1.8/net/http.rb:1052:in `request'
    from /usr/lib/ruby/1.8/net/http.rb:948:in `request_get'
    from net.rb:9
    from /usr/lib/ruby/1.8/net/http.rb:543:in `start'
    from /usr/lib/ruby/1.8/net/http.rb:440:in `start'
    from net.rb:7
+2  A: 

re-edited, original answer is at the bottom

I don't think you are idling in the first code snippet from your pastie. Try the following to see what i mean:


h = Net::HTTP.new uri.host,uri.port
h.set_debug_output $stderr
h.start do |http|
  http.request_get(uri.path) do |response|
  end
end

What's happening is that by issuing a GET, your client is obligated to read the entire document from the socket, whether or not you actually do anything with it. This is just part of the HTTP spec.

If you don't call response.read_body, you prevent your code from reading the response into memory, but the block won't return until all data has been read off of the socket. Your block with a break call is breaking out before the final read that is intended to make your code HTTP compliant even though you decide not to read the response into memory. I edited your pastie to point out where the final read happens.

You just happen to be reading an ISO file that is massive, so it looks like you're idling.

The short answer is that you should issue a HEAD request if you don't intend on reading the entire document, as specified in the HTTP spec.

The complicated answer is that you can issue a partial GET if you issue a byte range as specified here, but I'm not sure that the ruby http client library supports this mode of operation.

By calling http.finish you're closing the tcp socket early, which does the job as far as breaking you out of the code block, but raises an exception in calling code because you're "not supposed" to do this. You are welcome to call finish if you're willing to catch the exception, but you're not playing nice with HTTP.

original answer

You shouldn't be calling finish, the connection will get closed when the block exits. Documentation here.

The exception is being thrown from this code

If you really want to force the socket to close early, just catch the IOError.

I just noticed that you're initializing response to the result of calling head, but then you're using it again as a block parameter.

Just check the content type before you call request_get, conditional on content_type.

klochner
Right, but I want to force-close the connection. I don't want to complete the request unless the content-type is as expected. I also want to avoid having to run a .head request on each url. So, .get, if html continue, else close connection.
Alexandre
That's what I want to avoid. I want to notify the http client that I don't want to keep on reading the response body.
Alexandre
It shouldn't complete the request unless you call response.read_body. You should eliminate the head call if you're not going to use it.
klochner
http://ruby-doc.org/stdlib/libdoc/net/http/rdoc/classes/Net/HTTP.html#M000694
klochner
Check this out: http://pastebin.org/85138I don't see why the break would make any difference, but it does...
Alexandre
That's interesting . . . checking it out now.
klochner
I just edited the question after seeing your edit. The head call works, but that's what I want to avoid, making a head call.
Alexandre
I don't think it's idling - I'll edit my answer to make this cleaner.
klochner
and this may be useful:http://apocryph.org/2008/09/27/absolutely_bullshit_ruby_http_client_situation/
klochner
A: 

I haven't run this through a local proxy to make absolutely sure but the speed tells me it doesn't read the body unless it's content-type html.

url = URI.parse('http://alicebobandmallory.com/')
body=""
res = Net::HTTP.start(url.host, url.port) {|http|
  http.request_get(url.path) {|response|
    break unless response['content-type'] =~ /html/i
    response.read_body {|b|
     body<<b
    }
  }
}
Jonas Elfström
A: 

I ended using this solution (catching the exception):

require 'net/http'


uri = URI.parse('http://mirror.globo.com/ubuntu/releases/6.06.2/ubuntu-6.06.2-server-amd64.iso')

begin
  Net::HTTP.start(uri.host, uri.port) do |http|
    http.request_get(uri.path) do |response|
      unless response['content-type'] =~ /html/i
        p response['content-type']
        p 'didnt get html, stopping transfer'
        http.finish      
        # break
      end
      response.read_body do |data|
        p 'receiving data'
      end
    end
    p 'transfer succesful!'
  end
rescue 
  p 'rescued it'
end

p 'broke out of net loop'

I also had a look at libcurl through curb (http://curb.rubyforge.org) but it relies on callbacks, not blocks, and the callbacks don't pass in the Curl instance so there's no way to kill the connection like there is with Net::HTTP.

Alexandre