ansaurus

Question

Answer 1

+1 A:

I'm not entirely clear what you're trying to do, but you can probably accomplish it with Mechanize or watir. Mechanize parses and interacts with websites directly, but it doesn't support much in the way of Javascript. Watir allows you to drive an actual browser. Which is best for you depends on what you're actually doing.

Pesto 2009-09-28 15:55:16

Using an actual gui browser is realy overkill, for just downloading a file. You should also mind, that not every script is run on a gui system.

johannes 2009-09-30 14:31:56

@johannes: You might want to consider reading answers before commenting. I also recommended Mechanize, which *doesn't* need a graphical environment. But, as my answer mentions, it doesn't handle a good deal of Javascript. If that were an issue, I provided an alternative solution.

Pesto 2009-09-30 14:50:34

Answer 2

+1 A:

I don't know about the ruby, but doi.org will return a redirect as an HTTP code 302 (Moved Temporarily), along with a header called "Location:" that contains the publisher's website link. Then you'll have to scrape that page to find the PDF.

Jim Downing 2009-09-28 16:42:40

Answer 3

A:

Since you're already writing Ruby, this seems like a great fit for ScRUBYt.

hgimenez 2009-09-28 17:26:29

Answer 4

+1 A:

A simple solution would be to use wget from inside ruby.

system("wget -O \"#{target}\" \"#{source\"")

system returns true or false wether wget returned 0 or something else
be sure to properly escape target and source, or somebody might take over your system
if you don't want wget's output in your teminal append "> /dev/null 2> /dev/null" to the system argument

A cleaner solution would be to use Net::HTTP. The following example is taken from the Net::HTTP docs. Have a look at http://ruby-doc.org/stdlib/libdoc/net/http/rdoc/index.html for more info.

require 'net/http'
require 'uri'

def fetch(uri_str, limit = 10)
  # You should choose better exception.
  raise ArgumentError, 'HTTP redirect too deep' if limit == 0

  response = Net::HTTP.get_response(URI.parse(uri_str))
  case response
  when Net::HTTPSuccess     then response
  when Net::HTTPRedirection then fetch(response['location'], limit - 1)
  else
    response.error!
  end
end

johannes 2009-09-29 13:09:21

ansaurus

tags:

views:

answers:

Grab PDF file from website?

related questions