views:

252

answers:

3

I'm perplexed as to why I'm not able to download the entire contents of some JSON responses from FriendFeed using urllib2.

>>> import urllib2
>>> stream = urllib2.urlopen('http://friendfeed.com/api/room/the-life-scientists/profile?format=json')
>>> stream.headers['content-length']
'168928'
>>> data = stream.read()
>>> len(data)
61058
>>> # We can see here that I did not retrieve the full JSON
... # given that the stream doesn't end with a closing }
... 
>>> data[-40:]
'ce2-003048343a40","name":"Vincent Racani'

How can I retrieve the full response with urllib2?

+1  A: 

Keep calling stream.read() until it's done...

while data = stream.read() :
    ... do stuff with data
John Weldon
`read()` is exhaustive. Repeat calls to it return an empty string.
gotgenes
yes, and an empty string returns false...
John Weldon
A: 
readlines()

also works

inspectorG4dget
It doesn't for me. `data = ''.join(stream.readlines()); print len(data); print(data[-40:])` gives identical results.
gotgenes
stream.readlines() returns a list of all the lines. But I just also realized that you are using the urllib2 module. My answer was based ont he urllib module which I have been using for longer and I just double checked the stream.readlines() from the urllib module and it works properly
inspectorG4dget
+1  A: 

Best way to get all of the data:

fp = urllib2.urlopen("http://www.example.com/index.cfm")

response = ""
while 1:
    data = fp.read()
    if not data:         # This might need to be    if data == "":   -- can't remember
        break
    response += data

print response

The reason is that .read() isn't guaranteed to return the entire response, given the nature of sockets. I thought this was discussed in the documentation (maybe urllib) but I cannot find it.

Jed Smith
I couldn't get this example to work with the example URL given in the question, http://friendfeed.com/api/room/the-life-scientists/profile?format=json. The response is still incomplete. As I mentioned to John Weldon, repeat calls to `read()` only return empty strings, and `read()` seems exhaustive.
gotgenes
I only get 51.21 KB (52441 bytes) in my browser. The site is broken.
Jed Smith