url = "http://example.com/file.xml"
data = urllib2.urlopen(url)
data.read()
The question is, when exactly will the file be downloaded from the internet? When i do urlopen or .read()? On my network interface I see high traffic both times.
url = "http://example.com/file.xml"
data = urllib2.urlopen(url)
data.read()
The question is, when exactly will the file be downloaded from the internet? When i do urlopen or .read()? On my network interface I see high traffic both times.
Witout looking at the code, I'd expect that the following happens:
urlopen()
opens the connection, and sends the query. Then the server starts feeding the reply. At this point, the data accumulates in buffers until they are full and the operating system tells the server to hold on for a while.data.read()
empties the buffer, so the operating system tells the server to go on, and the rest of the reply gets downloaded.Naturally, if the reply is short enough, or if the .read()
happens quickly enough, then the buffers do not have time to fill up and the download happens in one go.
I agree with ddaa. However, if you want to understand this sort of thing, you can set up a dummy server using something like nc
(in *nix) and then open the URL in the interactive Python interpreter.
In one terminal, run nc -l 1234
which will open a socket and listen for connections on port 1234 of the local machine. nc
will accept an incoming connection and display whatever it reads from the socket. Anything you type into nc
will be sent over the socket to the remote connection, in this case Python's urlopen()
.
Run Python in another terminal and enter your code, i.e.
data = urllib2.urlopen('http://127.0.0.1:1234')
data.read()
The call to urlopen()
will establish the connection to the server, send the request and then block waiting for a response. You will see that nc
prints the HTTP request into it's terminal.
Now type something into the terminal that is running nc
. The call to urlopen()
will still block until you press ENTER in nc
, that is, until it receives a new line character. So urlopen()
will not return until it has read at least one new line character. (For those concerned about possible buffering by nc
, this is not an issue. urlopen()
will block until it sees the first new line character.)
So it should be noted that urlopen()
will block until the first new line character is received, after which data can be read from the connection. In practice, HTTP responses are short multiline responses, so urlopen()
should return quite quickly.