ansaurus

Question

most efficient way to get first and last line of file python

Answer 1

+3 A:

with open(fname, 'rb') as fh:
    first = next(fh).decode()

    fh.seek(-1024, 2)
    last = fh.readlines()[-1].decode()

The variable value here is 1024: it represents the average string length. I choose 1024 only for example. If you have an estimate of average line length you could just use that value times 2.

Since you have no idea whatsoever about the possible upper bound for the line length, the obvious solution would be to loop over the file:

for line in fh:
    pass
last = line

You don't need to bother with the binary flag you could just use open(fname).

ETA: Since you have many files to work on, you could create a sample of couple of dozens of files using random.sample and run this code on them to determine length of last line. With an a priori large value of the position shift (let say 1 MB). This will help you to estimate the value for the full run.

SilentGhost 2010-07-27 18:06:46

As long as the lines aren't longer than 1024 characters.

FogleBird 2010-07-27 18:08:55

There is no guarantee that the lines aren't longer than 1024 characters, there may be some other junk besides the timestamps on the line.

pasbino 2010-07-27 18:10:45

@pasbino: do you have *some* upper bound?

SilentGhost 2010-07-27 18:11:25

@pasbino: You can still use a similar approach in a loop until you find a full line.

FogleBird 2010-07-27 18:12:02

Unfortunately, I have not seen every line of these files. From a quick glance some of these lines seem extremely long. I don't think I can estimate an upper bound safely.

pasbino 2010-07-27 18:14:24

@pasbino: 1 MB? there is always possibility to check for EOL character in the cut off chunk and cut more.

SilentGhost 2010-07-27 18:16:24

Hmmmmm thats what I feared. Currently I loop over the whole file and it takes a while. But I guess without a line length upper bound there is no faster way.

pasbino 2010-07-27 18:18:37

The files are about 150 MB's in size

pasbino 2010-07-27 18:20:20

@pasbino: 1 MB was an example of the length of last string.

SilentGhost 2010-07-27 18:21:26

Uhmmm so depending on the file its different. I just checked on a few and it seems like they're only a few kilobytes

pasbino 2010-07-27 18:26:38

@pasbine: see my edit.

SilentGhost 2010-07-27 18:34:18

Sorry for so many questions but how does the seek work in your first example? It means set the file's current position to 1024 bytes from the end of the file?

pasbino 2010-07-27 18:37:55

@pasbino: yes. [docs](http://docs.python.org/library/io.html?highlight=seek#io.IOBase.seek) has more information.

SilentGhost 2010-07-27 18:43:33

Answer 2

A:

Can you use unix commands? I think using head -1 and tail -n 1 are probably the most efficient methods. Alternatively, you could use a simple fid.readline() to get the first line and fid.readlines()[-1], but that may take too much memory.

beitar 2010-07-27 18:07:27

Hmm would creating a subprocess to execute these commands be the most efficient way then?

pasbino 2010-07-27 18:10:21

If you do have unix then `os.popen("tail -n 1 %s" % filename).read()` gets the last line nicely.

Michael Dunn 2010-07-27 18:49:54

Answer 3

A:

Getting the first line is trivially easy. For the last line, presuming you know an approximate upper bound on the line length, os.lseek some amount from SEEK_END find the second to last line ending and then readline() the last line.

msw 2010-07-27 18:08:31

I do not have an approximate upper bound on line length

pasbino 2010-07-27 18:11:31

Answer 4

+1 A:

Here's a modified version of SilentGhost's answer that will do what you want.

with open(fname, 'rb') as fh:
    first = next(fh)
    offs = -100
    while True:
        fh.seek(offs, 2)
        lines = fh.readlines()
        if len(lines)>1:
            last = lines[-1]
            break
        offs *= 2
    print first
    print last

No need for an upper bound for line length here.

m01 2010-07-27 18:39:57

ansaurus

tags:

views:

answers:

most efficient way to get first and last line of file python

related questions