I would first read all of the allegedly-big file in memory as a list of lines:
with open('socalledbig.txt', 'rt') as f:
lines = f.readlines()
should take little more than 4MB -- tiny even by the standard of today's phones, much less ordinary computers.
Then, perform whatever processing you need to determine the beginning and ending of each group of lines you want to write out to a smaller files (I'm not sure by your question's text whether such groups can overlap or leave gaps, so I'm offering the most general solution where they're fully allowed to -- this will also cover more constrained use cases, with no real performance penalty, though code might be a tad simpler if the constraints were very rigid).
Say that you put these numbers in lists starts
(index from 0 of first line to write, included), ends
(index from 0 of first line to NOT write -- may legitimately and innocuosly be len(lines)
or more), names
(filenames to which you want to write), all lists having the same length of course.
Then, lastly:
assert len(starts) == len(ends) == len(names)
for s, e, n in zip(starts, ends, names):
with open(n, 'wt') as f:
f.writelines(lines[s:e])
...and that's all you need to do!
Edit: the OP seems to be confused by the concept of having these lists, so let me try to give an example: each block written out to a file starts at a line containing 'begin'
(included) and ends at the first immediately succeeding line containing 'end'
(also included), and the names of the files to be written are to be result0.txt
, result1.txt
, and so on.
It's an error if the number of "closing ends" differ from that of "opening begins" (and remember, the first immediately succeeding "end" terminates all pending "begins"); no line is allowed to contain both 'begin' and 'end'.
A very arbitrary set of conditions, to be sure, but then, the OP leaves us totally in the dark about the actual specifics of the problem, so what else can we do but guess most wildly?-)
outfile = 0
starts = []
ends = []
names = []
for i, line in enumerate(lines):
if 'begin' in line:
if 'end' in line:
raise ValueError('Both begin and end: %r' % line)
starts.append(i)
names.append('result%d.txt' % outfile)
outfile += 1
elif 'end' in line:
ends.append(i + 1) # remember ends are EXCLUDED, hence the +1
That's it -- the assert
about the three lists having identical lengths will take care of checking that the constraints are respected.
As the constraints and specs are changed, so of course will this snippet of code change accordingly -- as long as it fills the three equal-length lists starts
, ends
, and names
, exactly how it does so matters not in the least to the rest of the code.