ansaurus

Question

Answer 1

+2 A:

re-edited, original answer is at the bottom

I don't think you are idling in the first code snippet from your pastie. Try the following to see what i mean:


h = Net::HTTP.new uri.host,uri.port
h.set_debug_output $stderr
h.start do |http|
  http.request_get(uri.path) do |response|
  end
end

What's happening is that by issuing a GET, your client is obligated to read the entire document from the socket, whether or not you actually do anything with it. This is just part of the HTTP spec.

If you don't call response.read_body, you prevent your code from reading the response into memory, but the block won't return until all data has been read off of the socket. Your block with a break call is breaking out before the final read that is intended to make your code HTTP compliant even though you decide not to read the response into memory. I edited your pastie to point out where the final read happens.

You just happen to be reading an ISO file that is massive, so it looks like you're idling.

The short answer is that you should issue a HEAD request if you don't intend on reading the entire document, as specified in the HTTP spec.

The complicated answer is that you can issue a partial GET if you issue a byte range as specified here, but I'm not sure that the ruby http client library supports this mode of operation.

By calling http.finish you're closing the tcp socket early, which does the job as far as breaking you out of the code block, but raises an exception in calling code because you're "not supposed" to do this. You are welcome to call finish if you're willing to catch the exception, but you're not playing nice with HTTP.

original answer

You shouldn't be calling finish, the connection will get closed when the block exits. Documentation here.

The exception is being thrown from this code

If you really want to force the socket to close early, just catch the IOError.

I just noticed that you're initializing response to the result of calling head, but then you're using it again as a block parameter.

Just check the content type before you call request_get, conditional on content_type.

klochner 2010-02-01 21:40:33

Right, but I want to force-close the connection. I don't want to complete the request unless the content-type is as expected. I also want to avoid having to run a .head request on each url. So, .get, if html continue, else close connection.

Alexandre 2010-02-01 21:50:01

That's what I want to avoid. I want to notify the http client that I don't want to keep on reading the response body.

Alexandre 2010-02-01 21:57:59

It shouldn't complete the request unless you call response.read_body. You should eliminate the head call if you're not going to use it.

klochner 2010-02-01 21:59:12

http://ruby-doc.org/stdlib/libdoc/net/http/rdoc/classes/Net/HTTP.html#M000694

klochner 2010-02-01 21:59:52

Check this out: http://pastebin.org/85138I don't see why the break would make any difference, but it does...

Alexandre 2010-02-01 22:04:57

That's interesting . . . checking it out now.

klochner 2010-02-01 22:09:20

I just edited the question after seeing your edit. The head call works, but that's what I want to avoid, making a head call.

Alexandre 2010-02-01 22:24:26

I don't think it's idling - I'll edit my answer to make this cleaner.

klochner 2010-02-01 22:53:34

and this may be useful:http://apocryph.org/2008/09/27/absolutely_bullshit_ruby_http_client_situation/

klochner 2010-02-01 23:54:04

Answer 2

A:

I haven't run this through a local proxy to make absolutely sure but the speed tells me it doesn't read the body unless it's content-type html.

url = URI.parse('http://alicebobandmallory.com/')
body=""
res = Net::HTTP.start(url.host, url.port) {|http|
  http.request_get(url.path) {|response|
    break unless response['content-type'] =~ /html/i
    response.read_body {|b|
     body<<b
    }
  }
}

Jonas Elfström 2010-02-01 23:20:59

Answer 3

A:

I ended using this solution (catching the exception):

require 'net/http'


uri = URI.parse('http://mirror.globo.com/ubuntu/releases/6.06.2/ubuntu-6.06.2-server-amd64.iso')

begin
  Net::HTTP.start(uri.host, uri.port) do |http|
    http.request_get(uri.path) do |response|
      unless response['content-type'] =~ /html/i
        p response['content-type']
        p 'didnt get html, stopping transfer'
        http.finish      
        # break
      end
      response.read_body do |data|
        p 'receiving data'
      end
    end
    p 'transfer succesful!'
  end
rescue 
  p 'rescued it'
end

p 'broke out of net loop'

I also had a look at libcurl through curb (http://curb.rubyforge.org) but it relies on callbacks, not blocks, and the callbacks don't pass in the Curl instance so there's no way to kill the connection like there is with Net::HTTP.

Alexandre 2010-02-03 17:57:27

ansaurus

tags:

views:

answers:

How to cancel Ruby Net::HTTP request?

related questions