views:

433

answers:

1

I am running in to OpenURI::HTTPError: 403 Forbidden error when I try to open a URL with a comma (OR other special characters like .). I am able to open the same url in a browser.

require 'open-uri'
url = "http://en.wikipedia.org/wiki/Thor_Industries,_Inc."
f = open(url)
# throws OpenURI::HTTPError: 403 Forbidden error

How do I escape such URL?

I have tried to escape the url with CGI::escape and I get the same error.

f = open(CGI::escape(url))
+5  A: 

Typically, one would simply require the module cgi, then use CGI::escape(str).

require 'cgi'
require 'open-uri'
escaped_page = CGI::escape("Thor_Industries,_Inc.")
url = "http://en.wikipedia.org/wiki/#{escaped_page}"
f = open(url)

However, this doesn't seem to work for your particular instance, and still returns a 403. I'll leave this here for reference, regardless.


Edit: Wikipedia is refusing your requests because it suspects that you are a bot. It would seem that certain pages that are clearly content are granted to you, but those that don't match its "safe" pattern (e.g. those that contain dots or commas) are subject to its screening. If you actually output the content (I did this with Net::HTTP), you get the following:

Scripts should use an informative User-Agent string with contact information, or they may be IP-blocked without notice.

Providing a user-agent string, however, solves the issue:

open("http://en.wikipedia.org/wiki/Thor_Industries,_Inc.",
  "User-Agent" => "Ruby/#{RUBY_VERSION}")
Matchu
Though, testing it, I'm also getting a 403. I'll keep working at it...
Matchu
I had tried `CGI::escape` with similar results. I forgot to add it in my question.
KandadaBoggu
You might want to see what output you're getting. I remember a few weeks ago Wikipedia started giving me 403 errors when making requests from Ruby because I wasn't supplying any user-agent string, and the output said so.
Matchu
@KandadaBoggu: Yep, that was it. Edits made.
Matchu
Brilliant! It works. Unrelated Question: Do you throttle your requests to wiki to avoid rate limits?
KandadaBoggu
It was just a side-project I was working on for a few hours; I never actually got to the point where I hit rate limits. If they actually do have that sort of system, though, then you may need to make some sort of call like that.
Matchu