views:

815

answers:

13

I'm using a for loop to read a file, but I only want to read specific lines, say line #26 and #30. Is there any built-in feature to achieve this?

Thanks

A: 

File objects have a .readlines() method which will give you a list of the contents of the file, one line per list item. After that, you can just use normal list slicing techniques.

http://docs.python.org/library/stdtypes.html#file.readlines

Josh Wright
+1  A: 

How about this:

>>> with open('a', 'r') as fin: lines = fin.readlines()
>>> for i, line in enumerate(lines):
      if i > 30: break
      if i == 26: dox()
      if i == 30: doy()
Hamish Grubijan
True, this is less efficient than the one by Alok, but mine uses a with statement ;)
Hamish Grubijan
+9  A: 

The quick answer:

f=open('filename')
lines=f.readlines()
print lines[26]
print lines[30]

or:

lines=[26,30]
i=0
f=open('filename')
for line in f:
    if i in lines:
        print i
        i+=1

There is a more elegant solution for extracting many lines: linecache (courtesy of "python: how to jump to a particular line in a huge text file?", a previous stackoverflow.com question).

Quoting the python documentation linked above:

>>> import linecache
>>> linecache.getline('/etc/passwd', 4)
'sys:x:3:3:sys:/dev:/bin/sh\n'

Change the 4 to your desired line number, and you're on. Note that 4 would bring the fifth line as the count is zero-based.

If the file might be very large, and cause problems when read into memory, it might be a good idea to take @Alok's advice and use enumerate().

To Conclude:

  • Use fileobject.readlines() or for line in fileobject as a quick solution for small files.
  • Use linecache for a more elegant solution, which will be quite fast for reading many files, possible repeatedly.
  • Take @Alok's advice and use enumerate() for files which might be very large, and won't fit into memory. Note that using this method might slow because the file is read sequentially.
Adam Matan
Nice. I just looked at the source of `linecache` module, and looks like it reads the whole file in memory. So, if random access is more important than size optimization, `linecache` is the best method.
Alok
Thanks for the options! :)
Nimbuz
Your solution has an off by one error, btw :-)
Alok
Thanks, corrected it.
Adam Matan
+1  A: 

If you don't mind importing then fileinput does exactly what you need (this is you can read the line number of the current line)

ennuikiller
+11  A: 

If the file to read is big, and you don't want to read the whole file in memory at once:

fp = open("file")
for i, line in enumerate(fp):
    if i == 25:
        # 26th line
    elif i == 29:
        # 30th line
    elif i > 29:
        break
fp.close()

Note that i == n-1 for the nth line.

Alok
+1 Better solution than mine if the entire file isn't loaded into memory as in `linecache`. Are you sure that `enumerate(fp)` doesn't do that?
Adam Matan
Haha. We both did +1 to each other's solutions. I like SO precisely because of things like this: learning from each other. :-)
Alok
`enumerate(x)` uses `x.next`, so it doesn't need the entire file in memory.
Alok
My small beef with this is that A) You want to use with instead of the open / close pair and thus keep the body short, B) But the body is not that short. Sounds like a trade-off between speed/space and being Pythonic. I am not sure what the best solution would be.
Hamish Grubijan
Great. I like SO for that precise reason too. I'll add a link to your answer into mine.
Adam Matan
with is overrated, python got along fine for over 13 years without it
Dan D
A: 

You can do a seek() call which positions your read head to a specified byte within the file. This won't help you unless you know exactly how many bytes (characters) are written in the file before the line you want to read. Perhaps your file is strictly formatted (each line is X number of bytes?) or, you could count the number of characters yourself (remember to include invisible characters like line breaks) if you really want the speed boost.

Otherwise, you do have to read every line prior to the line you desire, as per one of the many solutions already proposed here.

Roman Stolper
+1  A: 
def getitems(iterable, items):
  items = list(items) # get a list from any iterable and make our own copy
                      # since we modify it
  if items:
    items.sort()
    for n, v in enumerate(iterable):
      if n == items[0]:
        yield v
        items.pop(0)
        if not items:
          break

print list(getitems(open("/usr/share/dict/words"), [25, 29]))
# ['Abelson\n', 'Abernathy\n']
# note that index 25 is the 26th item
Roger Pate
Roger, my favorite guy! This could benefit from a with statement.
Hamish Grubijan
A: 
f = open(filename, 'r')
totalLines = len(f.readlines())
f.close()
f = open(filename, 'r')

lineno = 1
while lineno < totalLines:
    line = f.readline()

    if lineno == 26:
        doLine26Commmand(line)

    elif lineno == 30:
        doLine30Commmand(line)

    lineno += 1
f.close()
inspectorG4dget
this is as unpythonic as it gets.
SilentGhost
Gives the wrong result, as you can't use readlines and readline like that (they each change the current read position).
Roger Pate
I'm sorry for having overlooked a HUGE error in my first code. The error has been corrected and the current code should work as expected. Thanks for pointing out my error, Roger Pate.
inspectorG4dget
A: 

I prefer this approach because it's more general-purpose, i.e. you can use it on a file, on the result of f.readlines(), on a StringIO object, whatever:

def read_specific_lines(file, lines_to_read):
   """file is any iterable; lines_to_read is an iterable containing int values"""
   lines = set(lines_to_read)
   last = max(lines)
   for n, line in enumerate(file):
      if n + 1 in lines:
          yield line
      if n + 1 > last:
          return

>>> with open(r'c:\temp\words.txt') as f:
        [s for s in read_specific_lines(f, [1, 2, 3, 1000])]
['A\n', 'a\n', 'aa\n', 'accordant\n']
Robert Rossney
+6  A: 

A fast and compact approach could be:

def picklines(thefile, whatlines):
  return [x for i, x in enumerate(thefile) if i in whatlines]

this accepts any open file-like object thefile (leaving up to the caller whether it should be opened from a disk file, or via e.g a socket, or other file-like stream) and a set of zero-based line indices whatlines, and returns a list, with low memory footprint and reasonable speed. If the number of lines to be returned is huge, you might prefer a generator:

def yieldlines(thefile, whatlines):
  return (x for i, x in enumerate(thefile) if i in whatlines)

which is basically only good for looping upon -- note that the only difference comes from using rounded rather than square parentheses in the return statement, making a list comprehension and a generator expression respectively.

Further note that despite the mention of "lines" and "file" these functions are much, much more general -- they'll work on any iterable, be it an open file or any other, returning a list (or generator) of items based on their progressive item-numbers. So, I'd suggest using more appropriately general names;-).

Alex Martelli
IMO, `for i, x in enumerate(thefile):` `if i in whatlines:` `yield x` (across three lines) reads clearer than returning a generator expression.
ephemient
@ephemient, I disagree -- the genexp reads smoothly and perfectly.
Alex Martelli
A: 

@OP, you can use enumerate

for n,line in enumerate(open("file")):
    if n+1 in [26,30]: # or n in [25,29] 
       print line.rstrip()
ghostdog74
+1  A: 

Here's my little 2 cents, for what it's worth ;)

def indexLines(filename, lines=[2,4,6,8,10,12,3,5,7,1]):
    fp   = open(filename, "r")
    src  = fp.readlines()
    data = [(index, line) for index, line in enumerate(src) if index in lines]
    fp.close()
    return data


# Usage below
filename = "C:\\Your\\Path\\And\\Filename.txt"
for line in indexLines(filename): # using default list, specify your own list of lines otherwise
    print "Line: %s\nData: %s\n" % (line[0], line[1])
AWainb
A: 

if you want line 7

line = open("file.txt", "r").readlines()[7]
MadSc13ntist