I have 2 simple questions about python:
1.How to get number of lines of a file in python?
2.How to locate the position in a file object to the last line easily?
I have 2 simple questions about python:
1.How to get number of lines of a file in python?
2.How to locate the position in a file object to the last line easily?
lines are just data delimited by the newline char '\n'
.
1) Since lines are variable length, you have to read the entire file to know where the newline chars are, so you can count how many lines:
count = 0
for line in open('myfile'):
count += 1
print count, line # it will be the last line
2) reading a chunk from the end of the file is the fastest method to find the last newline char.
def seek_newline_backwards(file_obj, eol_char='\n', buffer_size=200):
if not file_obj.tell(): return # already in beginning of file
# All lines end with \n, including the last one, so assuming we are just
# after one end of line char
file_obj.seek(-1, os.SEEK_CUR)
while file_obj.tell():
ammount = min(buffer_size, file_obj.tell())
file_obj.seek(-ammount, os.SEEK_CUR)
data = file_obj.read(ammount)
eol_pos = data.rfind(eol_char)
if eol_pos != -1:
file_obj.seek(eol_pos - len(data) + 1, os.SEEK_CUR)
break
file_obj.seek(-len(data), os.SEEK_CUR)
You can use that like this:
f = open('some_file.txt')
f.seek(0, os.SEEK_END)
seek_newline_backwards(f)
print f.tell(), repr(f.readline())
The only way to count lines [that I know of] is to read all lines, like this:
count = 0
for line in open("file.txt"): count = count + 1
After the loop, count
will have the number of lines read.
Answer to the first question (beware of poor performance on large files when using this method):
f = open("myfile.txt").readlines()
print len(f) - 1
Answer to the second question:
f = open("myfile.txt").read()
print f.rfind("\n")
P.S. Yes I do understand that this only suits for small files and simple programs. I think I will not delete this answer however useless for real use-cases it may seem.
For small files that fit memory,
how about using str.count()
for getting the number of lines of a file:
line_count = open("myfile.txt").read().count('\n')
Let's not forget
f = open("myfile.txt")
lines = f.readlines()
numlines = len(lines)
lastline = lines[-1]
NOTE: this reads the whole file in memory as a list. Keep that in mind in the case that the file is very large.
The easiest way is simply to read the file into memory. eg:
f = open('filename.txt')
lines = f.readlines()
num_lines = len(lines)
last_line = lines[-1]
However for big files, this may use up a lot of memory, as the whole file is loaded into RAM. An alternative is to iterate through the file line by line. eg:
f = open('filename.txt')
num_lines = sum(1 for line in f)
This is more efficient, since it won't load the entire file into memory, but only look at a line at a time. If you want the last line as well, you can keep track of the lines as you iterate and get both answers by:
f = open('filename.txt')
count=0
last_line = None
for line in f:
num_lines += 1
last_line = line
print "There were %d lines. The last was: %s" % (num_lines, last_line)
One final possible improvement if you need only the last line, is to start at the end of the file, and seek backwards until you find a newline character. Here's a question which has some code doing this. If you need both the linecount as well though, theres no alternative except to iterate through all lines in the file however.
I'd like too add to the other solutions that some of them (those who look for \n
) will not work with files with OS 9-style line endings (\r
only), and that they may contain an extra blank line at the end because lots of text editors append it for some curious reasons, so you might or might not want to add a check for it.
For the first question there're already a few good ones, I'll suggest @Brian's one as the best (most pythonic, line ending character proof and memory efficient):
f = open('filename.txt')
num_lines = sum(1 for line in f)
For the second one, I like @nosklo's one, but modified to be more general should be:
import os
f = open('myfile')
to = f.seek(0, os.SEEK_END)
found = -1
while found == -1 and to > 0:
fro = max(0, to-1024)
f.seek(fro)
chunk = f.read(to-fro)
found = chunk.rfind("\n")
to -= 1024
if found != -1:
found += fro
It seachs in chunks of 1Kb from the end of the file, until it finds a newline character or the file ends. At the end of the code, found is the index of the last newline character.