tags:

views:

94

answers:

2

I have text report files I need to "split()" like strings are split up into arrays.

So the file is like:

BOBO:12341234123412341234
1234123412341234123412341
123412341234
BOBO:12349087609812340-98
43690871234509875
45

BOBO:32498714235908713248
0987235

And I want to create 3 sub-files out of that splitting on lines that begin with "^BOBO:". I don't really want 3 physical files, I'd prefer 3 different file pointers.

+1  A: 

If you can deal with keeping them in memory to work with them something like this probably works:

subFileBlocks = []

with open('myReportFile.txt') as fh:
  for line in fh:
    if line.startswith('BOBO'):
      subFileBlocks.append(line)
    else:
      subFileBlocks[-1] += line

At the end of that subFileBlocks should contain your sections as strings.

g.d.d.c
you don't have to do `for line in fh.readlines()`. `for line in fh` suffices.
aaronasterling
and will actually keep the whole file out of memory at any given time.
aaronasterling
@aaronasterling - `for line in fh` may keep the file out of memory (or at least only load a line at a time), but my approach is reading it into a list that will exist when the file handle is done with. That was were my "if you don't mind it being in memory" comment came from. Thanks for the optimization though!
g.d.d.c
A: 

Perhaps use itertools.groupby:

import itertools

def bobo(x):    
    if x.startswith('BOBO:'):
        bobo.count+=1
    return bobo.count
bobo.count=0

with open('a') as f:
    for key,grp in itertools.groupby(f,bobo):
        print(key,list(grp))

yields:

(1, ['BOBO:12341234123412341234\n', '1234123412341234123412341\n', '123412341234\n'])
(2, ['BOBO:12349087609812340-98\n', '43690871234509875\n', '45\n', '\n'])
(3, ['BOBO:32498714235908713248\n', '0987235\n'])

Since you say you don't want physical files, the whole file must be able to fit in memory. In that case, to create file-like objects, use the cStringIO module:

import cStringIO
with open('a') as f:
    file_handles=[]
    for key,grp in itertools.groupby(f,bobo):
        file_handles.append(cStringIO.StringIO(''.join(grp)))

file_handles will be a list of file-like objects, one for each "BOBO:" stanza.

unutbu