views:

69

answers:

1

I'm using urllib2's urlopen function to try and get a JSON result from the StackOverflow api.

The code I'm using:

>>> import urllib2
>>> conn = urllib2.urlopen("http://api.stackoverflow.com/0.8/users/")
>>> conn.readline()

The result I'm getting:

'\x1f\x8b\x08\x00\x00\x00\x00\x00\x04\x00\xed\xbd\x07`\x1cI\x96%&/m\xca{\x7fJ\...

I'm fairly new to urllib, but this doesn't seem like the result I should be getting. I've tried it in other places and I get what I expect (the same as visiting the address with a browser gives me: a JSON object).

Using urlopen on other sites (e.g. "http://google.com") works fine, and gives me actual html. I've also tried using urllib and it gives the same result.

I'm pretty stuck, not even knowing where to look to solve this problem. Any ideas?

+5  A: 

That almost looks like something you would be feeding to pickle. Maybe something in the User-Agent string or Accepts header that urllib2 is sending is causing StackOverflow to send something other than JSON.

One telltale is to look at conn.headers.headers to see what the Content-Type header says.

And this question, Odd String Format Result from API Call, may have your answer. Basically, you might have to run your result through a gzip decompressor.

Double checking with this code:

>>> req = urllib2.Request("http://api.stackoverflow.com/0.8/users/",
                          headers={'Accept-Encoding': 'gzip, identity'})
>>> conn = urllib2.urlopen(req)
>>> val = conn.read()
>>> conn.close()
>>> val[0:25]
'\x1f\x8b\x08\x00\x00\x00\x00\x00\x04\x00\xed\xbd\x07`\x1cI\x96%&/m\xca{\x7fJ'

Yes, you are definitely getting gzip encoded data back.

Since you seem to be getting different results on different machines with the same version of Python, and in general it looks like the urllib2 API would require you do something special to request gzip encoded data, my guess is that you have a transparent proxy in there someplace.

I saw a presentation by the EFF at CodeCon in 2009. They were doing end-to-end connectivity testing to discover dirty ISP tricks of various kinds. One of the things they discovered while doing this testing is that a surprising number of consumer level NAT routers add random HTTP headers or do transparent proxying. You might have some piece of equipment on your network that's adding or modifying the Accept-Encoding header in order to make your connection seem faster.

Omnifarious
Hmm, that makes sense. Any idea why this would be different on different computers (running the same version of Python)?
Edan Maor
@Edan Maor: I have no idea. It seems odd to me.
Omnifarious
Yep, I just checked on my own system and that was definitely the problem (I used the guide found on http://diveintopython.org/http_web_services/gzip_compression.html to try and uncompress).Still have no idea why this only happens for me though, since it works fine for other developers here, and apparently works fine for the author of the wrapper as well.
Edan Maor
@Edan Maor: I have a guess about what's happening, and it's an ugly and enlightening one. I updated my answer with it.
Omnifarious
@Omnifarious: I think your guess is right. My internet connection has always had "weird" problems (e.g. files which friends can download but which I can't... happened once every few months). I'm guessing this is a problem with my home router. Time to replace it. Anyway, thanks for all the help, I'm working on updating the wrapper to deal with this properly.
Edan Maor