ansaurus

Question

Break a text file into chunks based on line like the string split operation?

Answer 1

+1 A:

If you can deal with keeping them in memory to work with them something like this probably works:

subFileBlocks = []

with open('myReportFile.txt') as fh:
  for line in fh:
    if line.startswith('BOBO'):
      subFileBlocks.append(line)
    else:
      subFileBlocks[-1] += line

At the end of that subFileBlocks should contain your sections as strings.

g.d.d.c 2010-09-03 21:29:43

you don't have to do `for line in fh.readlines()`. `for line in fh` suffices.

aaronasterling 2010-09-04 01:40:22

and will actually keep the whole file out of memory at any given time.

aaronasterling 2010-09-04 01:54:53

@aaronasterling - `for line in fh` may keep the file out of memory (or at least only load a line at a time), but my approach is reading it into a list that will exist when the file handle is done with. That was were my "if you don't mind it being in memory" comment came from. Thanks for the optimization though!

g.d.d.c 2010-09-04 05:33:29

Answer 2

A:

Perhaps use itertools.groupby:

import itertools

def bobo(x):    
    if x.startswith('BOBO:'):
        bobo.count+=1
    return bobo.count
bobo.count=0

with open('a') as f:
    for key,grp in itertools.groupby(f,bobo):
        print(key,list(grp))

yields:

(1, ['BOBO:12341234123412341234\n', '1234123412341234123412341\n', '123412341234\n'])
(2, ['BOBO:12349087609812340-98\n', '43690871234509875\n', '45\n', '\n'])
(3, ['BOBO:32498714235908713248\n', '0987235\n'])

Since you say you don't want physical files, the whole file must be able to fit in memory. In that case, to create file-like objects, use the cStringIO module:

import cStringIO
with open('a') as f:
    file_handles=[]
    for key,grp in itertools.groupby(f,bobo):
        file_handles.append(cStringIO.StringIO(''.join(grp)))

file_handles will be a list of file-like objects, one for each "BOBO:" stanza.

unutbu 2010-09-03 21:36:10

ansaurus

tags:

views:

answers:

Break a text file into chunks based on line like the string split operation?

related questions