views:

747

answers:

3

This is a weird one that anyone can repro at home (I think) - I am trying to write a simple service to run searches on Twitter on a service hosted on EC2. Twitter returns me errors 100% of the time when run in ruby, but not in other languages, which would indicate it's not an IP-blocking issue. Here is an example:

admin@ec2-xx-101-152-xxx-production:~$ irb
irb(main):001:0> require 'net/http'
=> true
irb(main):002:0> res = Net::HTTP.post_form(URI.parse('http://search.twitter.com/search.json'), {'q' => 'twitter'})
=> #<Net::HTTPBadRequest 400 Bad Request readbody=true>
irb(main):003:0> exit
admin@ec2-xx-101-152-xxx-production:~$ curl http://search.twitter.com/search.json?q=twitter
{"results":[{"text":"&quot;Social Media and SE(Search Engine) come side by side to help promote your business and bran...<snip/>

As you see, CURL works, irb does not. When I run on my local windows box in irb, success:

$ irb
irb(main):001:0> require 'net/http'
=> true
irb(main):002:0> res = Net::HTTP.post_form(URI.parse('http://search.twitter.com/search.json'), {'q' => 'twitter'})
=> #<Net::HTTPOK 200 OK readbody=true>

This is confusing...if there was some kind of core bug in Net::HTTP, I would think it would show up both on windows and linux, and if I was being blocked by my IP, then curl shouldn't work either. I tried this on a fresh Amazon instance too with a fresh IP addy.

Anyone should be able to repro this 'cause I'm using the ec2onrails ami:

ec2-run-instances ami-5394733a -k testkeypair

Just ssh in after that and run those simple lines above. Anyone have ideas what's going on?

Thanks!

A: 

The HTTP 400 error message is returned by twitter when a single client exceeds the number of maximum requests per hour. I don't know how your ec2 instance is configured therefore I don't know if your request is identified by a shared Amazon IP or a custom IP. In the first case it's reasonable to think that the limit is reached in a very small amount of time.

More details are available in the Twitter API doumentation:

To have more details about the reason of the error response, read your response content or headers. You should find an error message and some X-RateLimit twitter headers.

require 'net/http'
response = Net::HTTP.post_form(URI.parse('http://search.twitter.com/search.json'), {'q' => 'twitter'})

p response.headers
p response.body
Simone Carletti
Right, this was the first thing I considered, but if I was hitting the max, then the curl request sent immediately afterward should also report a failure. I am using an instance associated with a static IP, and I checked and requests are indeed coming from from allocated static IP. The response body adds little information: "<!DOCTYPE HTML PUBLIC \"-//IETF//DTD HTML 2.0//EN\">\n<html><head>\n<title>400 Bad Request</title>\n</head><body>\n<h1>Bad Request</h1>\n<p>Your browser sent a request that this server could not understand.<br />\n</p>\n</body></html>"
esilver
And php also fails with a 400 error. It would make sense if this was a simple blocked IP issue, but the fact remains that curl from the command line works...I'm wondering if curl is adding some header (a user-agent perhaps??) that twitter likes to see...I'm about to just write code to manually call curl from my ruby script.
esilver
It is the blank user-agent string. It had nothing to do with request limits being hit. I solved this issue in one line in ruby: Twitter::Search.default_options = {:headers => {'User-Agent' => 'YOUR_USER_AGENT_STRING'}}I also filed an issue with jnunemaker's twitter gem on github
esilver
+1  A: 

Check the Twitter API changelog. They are blocking requests from EC2 that don't have a User-Agent header in the HTTP request because people are using EC2 to find terms to spam.

Twitter recommends setting the User-Agent to your domain name, so they can check out sites that are causing problems and get in touch with you.

Luke Francl
A: 

Thanks for the info. Putting my domain in the USER-AGENT header fixed the same problem for me. I'm running http://LocalChirps.com on EC2 servers.

CURL Code snippet (PHP):



$twitter_api_url = 'http://search.twitter.com/search.atom?rpp='.$count.'&page='.$page;
$ch = curl_init($twitter_api_url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_USERAGENT, 'LocalChirps.com');
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
$twitter_data = curl_exec($ch);
$httpcode = curl_getinfo($ch, CURLINFO_HTTP_CODE);
curl_close($ch);
if ($httpcode != 200) {
    //echo 'error calling twitter';
    return;
}

Frank