views:

261

answers:

2

Suppose you open a file, and do an seek() somewhere in the file, how do you know the current file line ?

(I personally solved with an ad-hoc file class that maps the seek position to the line after scanning the file, but I wanted to see other hints and to add this question to stackoverflow, as I was not able to find the problem anywhere on google)

+2  A: 

Here's how I would approach the problem, using as much laziness as possible:

from random import randint
from itertools import takewhile, islice

file = "/etc/passwd"
f = open(file, "r")

f.seek(randint(10,250))
pos = f.tell()

print "pos=%d" % pos

def countbytes(iterable):
    bytes = 0
    for item in iterable:
        bytes += len(item)
        yield bytes

print 1+len(list(takewhile(lambda x: x <= pos, countbytes(open(file, "r")))))

For a slightly less readable but much more lazy approach, use enumerate and dropwhile:

from random import randint
from itertools import islice, dropwhile

file = "/etc/passwd"
f = open(file, "r")

f.seek(randint(10,250))
pos = f.tell()

print "pos=%d" % pos

def countbytes(iterable):
    bytes = 0
    for item in iterable:
        bytes += len(item)
        yield bytes

print list(
        islice(
            dropwhile(lambda x: x[1] <= pos, enumerate(countbytes(open(file, "r"))))
            , 1))[0][0]+1
Mark Rushakoff
cool use of itertools... :)
Stefano Borini
+1  A: 

When you use seek(), python gets to use pointer offsets to jump to the desired position in the file. But in order to know the current line number, you have to examine each character up to that position. So you might as well abandon seek() in favor of read():

Replace

f = open(filename, "r")
f.seek(55)

with

f = open(filename, "r")
line=f.read(55).count('\n')+1
print(line)

Perhaps you do not wish to use f.read(num) since this may require a lot of memory if num is very large. In that case, you could use a generator like this:

import itertools
import operator
line_number=reduce(operator.add,( f.read(1)=='\n' for _ in itertools.repeat(None,num)))
pos=f.tell()

This is equivalent to f.seek(num) with the added benefit of giving you line_number.

unutbu