ansaurus

Question

Answer 1

+8 A:

lines are just data delimited by the newline char '\n'.

1) Since lines are variable length, you have to read the entire file to know where the newline chars are, so you can count how many lines:

count = 0
for line in open('myfile'):
    count += 1
print count, line # it will be the last line

2) reading a chunk from the end of the file is the fastest method to find the last newline char.

def seek_newline_backwards(file_obj, eol_char='\n', buffer_size=200):
    if not file_obj.tell(): return # already in beginning of file
    # All lines end with \n, including the last one, so assuming we are just 
    # after one end of line char
    file_obj.seek(-1, os.SEEK_CUR)
    while file_obj.tell():
        ammount = min(buffer_size, file_obj.tell())
        file_obj.seek(-ammount, os.SEEK_CUR)
        data = file_obj.read(ammount)
        eol_pos = data.rfind(eol_char)
        if eol_pos != -1:
            file_obj.seek(eol_pos - len(data) + 1, os.SEEK_CUR)
            break
        file_obj.seek(-len(data), os.SEEK_CUR)

You can use that like this:

f = open('some_file.txt')
f.seek(0, os.SEEK_END)
seek_newline_backwards(f)
print f.tell(), repr(f.readline())

nosklo 2009-05-30 15:11:00

uh... but what if the last line is more than 200 chars from EOF?

Triptych 2009-05-30 16:03:24

sometimes, lines are instead delimited by \r; you might want to take that into account.

Michael Borgwardt 2009-05-30 16:06:20

@Michael Borgwardt: Good point, modified the code to take that into account, now the char used is a parameter to the function.

nosklo 2009-05-30 17:50:46

What if the file is 4GB and consists of a single line?

Ayman Hourieh 2009-05-30 22:27:38

Obviously Ayman is half-joking, but if one cares about file- and available-RAM- size, then the next step is to worry about corner cases like the one Ayman described.

ΤΖΩΤΖΙΟΥ 2009-05-31 01:14:36

@ΤΖΩΤΖΙΟΥ - It was an honest question. I wanted to point out that this solution is also vulnerable to memory exhaustion. If you are concerned about large files, you should also be concerned about files with very long lines.

Ayman Hourieh 2009-05-31 08:22:09

Answer 2

+1 A:

The only way to count lines [that I know of] is to read all lines, like this:

count = 0
for line in open("file.txt"): count = count + 1

After the loop, count will have the number of lines read.

grawity 2009-05-30 15:13:26

Answer 3

A:

Answer to the first question (beware of poor performance on large files when using this method):

f = open("myfile.txt").readlines()
print len(f) - 1

Answer to the second question:

f = open("myfile.txt").read()
print f.rfind("\n")

P.S. Yes I do understand that this only suits for small files and simple programs. I think I will not delete this answer however useless for real use-cases it may seem.

David Parunakian 2009-05-30 15:14:07

that reads the entire file to the memory at once.

nosklo 2009-05-30 15:16:48

I know, I have specifically edited the answer to mention that.

David Parunakian 2009-05-30 15:19:32

that also reads the entire file to a string, and then creates a list of strings splitted, speding at least 2 times the file size in memory. I'm not sure why one would use this method.

nosklo 2009-05-30 15:27:06

you should at least use .readlines()

nosklo 2009-05-30 15:27:20

Answer 4

+2 A:

For small files that fit memory, how about using str.count() for getting the number of lines of a file:

line_count = open("myfile.txt").read().count('\n')

gimel 2009-05-30 15:20:49

that will read the entire file to memory at once, so I guess a for loop is better.

nosklo 2009-05-30 15:21:55

Man, it's 2009. Don't be tied up by old-fashioned limits.

Charlie Martin 2009-05-30 15:26:07

@Charlie Martin: I have to deal with text files easily up to 4GB. And it is not tying me up, it is just better to read each line at a time instead of the entire file, even if it *fits* on memory. The OP is a beginner and should learn good practices that work regardless of the file size.

nosklo 2009-05-30 15:33:07

The answer clearly says "for small files that fit in memory" -- and besides, when is the last time you've had a myfile.txt that couldn't fit in memory? :-)

Martin Geisler 2009-05-30 20:05:10

The answer specifies that it's for "small files that fit memory", so I think that the answer is acceptable.

ΤΖΩΤΖΙΟΥ 2009-05-31 01:00:39

Answer 5

+7 A:

Let's not forget

f = open("myfile.txt")
lines = f.readlines()

numlines = len(lines)
lastline = lines[-1]

NOTE: this reads the whole file in memory as a list. Keep that in mind in the case that the file is very large.

Charlie Martin 2009-05-30 15:21:11

that also reads the entire file to the memory at once.

nosklo 2009-05-30 15:22:19

Yes, and? Back when I was writing business apps in 8K of memory, I might have cared.

Charlie Martin 2009-05-30 15:25:07

@Charlie Martin: 1) What if the file is 4GB? 2) What if I am already running another app that's using my memory, and I have only a few MB available? Should I hit virtual memory (swap)? Really?

nosklo 2009-05-30 15:36:09

@nosklo: Then you would change your algorithm. What is your point? There is no 'one size fits all' best solution for every problem on the planet. +1 for simplicity and explicitness.

Nick Presta 2009-05-30 15:50:54

@Charlie - I have to agree with nosklo here. Assuming the file will always fit in memory is the sort of lazy programming that can easily lead to vulnerabilites and instability.

Triptych 2009-05-30 16:02:24

@Nick Presta: Well, in Zen of Python we have: "There should be one and only one obvious way to do it". In this case, that fits, since doing a straight-forward loop is *simpler*.

nosklo 2009-05-30 17:07:25

@noskio, "premature optimization is the root of all evil." I mean, what if the file is encrypted? What if it's binary data? As to whether then I'd hit virtual memory, well, yeah, that's what its for. Oddly, in general the swapper is more efficient than file I/O.

Charlie Martin 2009-05-30 18:18:34

nosklo preaches caution and I agree. How big is the file? How much RAM does the OP —and any other viewer of this question— have available? We can't know the answer to these questions, so why risk it? In any case, this answer should make clear that this reads the whole file into memory, like nosklo suggested.

ΤΖΩΤΖΙΟΥ 2009-05-31 01:03:36

Answer 6

+5 A:

The easiest way is simply to read the file into memory. eg:

f = open('filename.txt')
lines = f.readlines()
num_lines = len(lines)
last_line = lines[-1]

However for big files, this may use up a lot of memory, as the whole file is loaded into RAM. An alternative is to iterate through the file line by line. eg:

f = open('filename.txt')
num_lines = sum(1 for line in f)

This is more efficient, since it won't load the entire file into memory, but only look at a line at a time. If you want the last line as well, you can keep track of the lines as you iterate and get both answers by:

f = open('filename.txt')
count=0
last_line = None
for line in f:
    num_lines += 1
    last_line = line
print "There were %d lines.  The last was: %s" % (num_lines, last_line)

One final possible improvement if you need only the last line, is to start at the end of the file, and seek backwards until you find a newline character. Here's a question which has some code doing this. If you need both the linecount as well though, theres no alternative except to iterate through all lines in the file however.

Brian 2009-05-30 15:42:56

how is reading the entire file easiest? your second solution looks much more easy

nosklo 2009-05-30 15:44:00

easy does not mean fast or efficient :-p

fortran 2009-05-30 17:50:50

Answer 7

+2 A:

I'd like too add to the other solutions that some of them (those who look for \n) will not work with files with OS 9-style line endings (\r only), and that they may contain an extra blank line at the end because lots of text editors append it for some curious reasons, so you might or might not want to add a check for it.

Etienne Perot 2009-05-30 15:56:49

right. using a for won't have this problem since python's readline() already deals with that.

nosklo 2009-05-30 17:09:46

FYI - OS-X uses a single '\n' http://en.wikipedia.org/wiki/Newline

JimB 2009-05-30 18:02:13

Right, um, OS 9 and lower then. I never knew Apple had changed its mind, good thing they did~

Etienne Perot 2009-05-31 14:47:59

Answer 8

A:

For the first question there're already a few good ones, I'll suggest @Brian's one as the best (most pythonic, line ending character proof and memory efficient):

f = open('filename.txt')
num_lines = sum(1 for line in f)

For the second one, I like @nosklo's one, but modified to be more general should be:

import os
f = open('myfile')
to = f.seek(0, os.SEEK_END)
found = -1
while found == -1 and to > 0:
  fro = max(0, to-1024)
  f.seek(fro)
  chunk = f.read(to-fro)
  found = chunk.rfind("\n")
  to -= 1024

if found != -1:
  found += fro

It seachs in chunks of 1Kb from the end of the file, until it finds a newline character or the file ends. At the end of the code, found is the index of the last newline character.

fortran 2009-05-30 18:04:33

ansaurus

tags:

views:

answers:

Two simple questions about python

related questions