tags:

views:

732

answers:

2
for filename in os.listdir("."):
    for line in open(filename).xreadlines():
        if "foo" in line:
            print line

So this is a simple python equivalent of cat filename | grep foo. However, I would like the equivalent of cat filename | grep -B 5 -C 5 foo, how should the above code be modified?

+7  A: 

Simplest way is:

for filename in os.listdir("."):
    lines = open(filename).readlines()
    for i, line in enumerate(lines):
        if "foo" in line:
            for x in lines[i-5 : i+6]:
                print x,

add line numbers, breaks between blocks, etc, to taste;-).

In the extremely unlikely case that you have to deal with absolutely humungous text files (ones over 200-300 times larger than the King James Bible, for example, which is about 4.3 MB in its entirety as a text file), I recommend a generator producing a sliding window (a "FIFO" of lines). Focusing for simplicity only on searching lines excluding the first and last few ones of the file (which requires a couple of special-case loops in addition -- that's why I'm returning the index as well... because it's not always 5 in those two extra loops!-):

import collections

def sliding_windows(it):
  fifo = collections.deque()
  # prime the FIFO with the first 10 
  for i, line in enumerate(it):
    fifo.append(line)
    if i == 9: break
  # keep yielding 11-line sliding-windows
  for line in it:
    fifo.append(line)
    yield fifo, 5
    fifo.popleft()

for w, i in sliding_windows(open(filename)):
  if "foo" in w[i]:
    for line in w: print line,

I think I'll leave the special-case loops (and worries about files of very few lines;-) as exercises, since the whole thing is so incredibly hypothetical anyway.

Just a few hints...: the closing "special-case loop" is really simple -- just repeatedly drop the first line, without appending, obviously, as there's nothing more to append... the index should still be always 5, and you're done when you've just yielded a window where 5 is the last index (i.e., the last line of the file); the starting case is a tad subtler as you don't want to yield until you've read the first 6 lines, and at that point the index will be 0 (first line of the file)...

Finally, for extra credit, consider how to make this work on very short files, too!-)

Alex Martelli
+1  A: 

Although I like the simplicity of Alex's answer, it would require lots of memory when grepping large files. How about this algorithm?

import os
for filename in (f for f in os.listdir(".") if os.path.isfile(f)):
    prevLines = []
    followCount = 0
    for line in open(filename):
        prevLines.append(line)
        if "foo" in line:
            if followCount <= 0:
                for prevLine in prevLines:
                    print prevLine.strip()  
            else:
                print line.strip()
            followCount = 5
        elif followCount > 0:
            print line.strip()
        followCount -= 1
        if len(prevLines) > 5:
            prevLines.pop(0)
Greg
It's unusual to deal with text files that are so huge (GBs?) as to give serious memory problems in an age where a cheap-ish laptop typically comes with 2 or 3 GB of RAM (the whole King James Bible is about 4.4 MB in size as a text file, for example). For such extremely peculiar needs, I'd suggest a more regular approach, simpler and clearer, bases on separating out the looping part -- let me edit my answer to show what I mean.
Alex Martelli
I'm not sure why anybody would grep for "foo" in the King James Bible.Still, I do think it's reasonable to solve this problem iteratively. I like your sliding_windows function.
Greg