In Python 2.6 or better:
def doit(inf, ouf, thestring, separator='SEPARATOR\n'):
thestring += '\n'
for line in inf:
# here we're always at the start-of-block separator
assert line == separator
blockid = next(inf)
if blockid == thestring:
# found block of interest, use enumerate to count its lines
for c, line in enumerate(inf):
if line == separator: break
assert line == separator
# emit results and terminate function
ouf.writelines((separator, thestring, '(%d)' % c, separator))
inf.close()
ouf.close()
return
# non-interesting block, just skip it
for line in inf:
if line == separator: break
In older Python versions you can do almost the same, but change the line blockid = next(inf)
to blockid = inf.next()
.
The assumptions here are that the input and output files are opened by the caller (which also passes in the interesting values of thestring
, and optionally separator
) but it's this function's job to close them (e.g. for maximum ease of use as a pipeline filter, with inf of sys.stdin
and ouf of sys.stdout
); easy to tweak if needed of course.
Removing the assert
s will speed it up microscopically, but I like their "sanity checking" role (and they may also help understand the logic of the code flow).
Key to this approach is that a file is an iterator (of lines) and iterators can be advanced in multiple places (so we can have multiple for
statements, or specific "advance the iterator" calls such as next(inf)
, and they cooperate properly).