This is a follow-up questions on a previous one.
Consider this code, which is less toyish than the one in the previous question (but still much simpler than my real one)
import sys
data=[]
for line in open(sys.argv[1]):
data.append(line[-1])
print data[-1]
Now, I was expecting a longer run time (my benchmark file is 65150224 lines long), possibly much longer. This was not the case, it runs in ~ 2 minutes on the same hw as before!
Is it data.append() very lightweight? I don't believe so, thus I wrote this fake code to test it:
data=[]
counter=0
string="a\n"
for counter in xrange(65150224):
data.append(string[-1])
print data[-1]
This runs in 1.5 to 3 minutes (there is strong variability among runs)
Why don't I get 3.5 to 5 minutes in the former program? Obviously data.append() is happening in parallel with the IO.
This is good news!
But how does it work? Is it a documented feature? Is there any requirement on my code that I should follow to make it works as much as possible (besides load-balancing IO and memory/CPU activities)? Or is it just plain buffering/caching in action?
Again, I tagged "linux" this question, because I'm interested only in linux-specific answers. Feel free to give OS-agnostic, or even other-OS answers, if you think it's worth doing.