Is there a built-in method to do it? If not how can I do this without costing too much overhead?
Seek to a random position, read a line and discard it, then read another line. The distribution of lines won't be normal, but that doesn't always matter.
Not built-in, but algorithm R(3.4.2)
(Waterman's "Reservoir Algorithm") from Knuth's "The Art of Computer Programming" is good (in a very simplified version):
import random
def random_line(afile):
line = next(afile)
for num, aline in enumerate(afile):
if random.randrange(num + 2): continue
line = aline
return line
The num + 2
produces the sequence 2, 3, 4... The randrange
will therefore be 0 with a probablity of 1.0/(num + 2)
-- and that's the probability with which we must replace the currently selected line (the special-case of sample size 1 of the referenced algorithm -- see Knuth's book for proof of correctness == and of course we're also in the case of a small-enough "reservoir" to fit in memory;-)... and exactly the probability with which we do so.
It depends what do you mean by "too much" overhead. If storing whole file in memory is possible, then something like
import random
random_lines = random.choice(open("file").readlines())
would do the trick.
import random
lines = open('file.txt').read().splitlines()
myline =random.choice(lines)
print(myline)
For very long file: seek to random place in file based on it's length and find two newline characters after position (or newline and end of file). Do again 100 characters before or from beginning of file if original seek position was <100 if we ended up inside the last line.