tags:

views:

316

answers:

4

We have a large raw data file that we would like to trim to a specified size. I am experienced in .net c#, however would like to do this in python to simplify things and out of interest.

How would I go about getting the first N lines of a text file in python? Will the OS being used have any effect on the implementation?

Thanks :)

+10  A: 
with open("datafile") as myfile:
    head=[myfile.next() for x in xrange(N)]
print head

Here's another way

from itertools import islice
with open("datafile") as myfile:
    head=list(islice(myfile,N))
print head
gnibbler
Thanks, that is very helpful indeed. What is the difference between the two? (in terms of performance, required libraries, compatibility etc)?
Russell
I expect the performance to be similar, maybe the first to be slightly faster. But the first one won't work if the file doesn't have at least N lines. You are best to measure the performance against some typical data you will be using it with.
gnibbler
The with statement works on Python 2.6, and requires an extra import statement on 2.5. For 2.4 or earlier, you'd need to rewrite the code with a try...except block. Stylistically, I prefer the first option, although as mentioned the second is more robust for short files.
Alasdair
islice is probably faster as it is implemented in C.
chrispy
@chrispy, I just tried it out and the second one was faster for the file that I was using as soon as N grows above 20 or so
gnibbler
+1  A: 

There is no specific method to read number of lines exposed by file object.

I guess the easiest way would be following:

lines =[]
with open(file_name) as f:
    lines.extend(f.readline() for i in xrange(N))
artdanil
This is something I had actually intended. Though, I though of adding each line to list. Thank you.
artdanil
A: 

If you want something that obviously (without looking up esoteric stuff in manuals) works without imports and try/except and works on a fair range of Python 2.x versions (2.2 to 2.6):

def headn(file_name, n):
    """Like *x head -N command"""
    result = []
    nlines = 0
    assert n >= 1
    for line in open(file_name):
        result.append(line)
        nlines += 1
        if nlines >= n:
            break
    return result

if __name__ == "__main__":
    import sys
    rval = headn(sys.argv[1], int(sys.argv[2]))
    print rval
    print len(rval)
John Machin
+1  A: 
N=10
f=open("file")
for i in range(N):
    line=f.next().strip()
    print line
f.close()
ghostdog74