tags:

views:

1726

answers:

4

How do I check for EOF in Python? I found a bug in my code where the last block of text after the separator isn't added to the return list. Or maybe there's a better way of expressing this function?

Here's my code:

def get_text_blocks(filename):
    text_blocks = []
    text_block = StringIO.StringIO()
    with open(filename, 'r') as f:
        for line in f:
            text_block.write(line)
            print line
            if line.startswith('-- -'):
                text_blocks.append(text_block.getvalue())
                text_block.close()
                text_block = StringIO.StringIO()
    return text_blocks
+1  A: 

The end-of-file condition holds as soon as the for statement terminates -- that seems the simplest way to minorly fix this code (you can extract text_block.getvalue() at the end if you want to check it's not empty before appending it).

Alex Martelli
Thanks Alex! My dirty solution was to add text_blocks.append(text_block.getvalue()) and text_block.close() below the for block. It works but it's not DRY :/
ajushi
A: 

Why do you need StringIO here?

def get_text_blocks(filename):
    text_blocks = [""]
    with open(filename, 'r') as f:
        for line in f:
            if line.startswith('-- -'):
                text_blocks.append(line)
            else: text_blocks[-1] += line          
    return text_blocks

EDIT: Fixed the function, other suggestions might be better, just wanted to write a function similar to the original one.

EDIT: Assumed the file starts with "-- -", by adding empty string to the list you can "fix" the IndexError or you could use this one:

def get_text_blocks(filename):
    text_blocks = []
    with open(filename, 'r') as f:
        for line in f:
            if line.startswith('-- -'):
                text_blocks.append(line)
            else:
                if len(text_blocks) != 0:
                    text_blocks[-1] += line          
    return text_blocks

But both versions look a bit ugly to me, the reg-ex version is much more cleaner.

Maiku Mori
That still misses the last block.
Mark Byers
Could you please provide test input data?
Maiku Mori
@maiku the test input data is a SQL dump by phpMyAdmin. I need to separate the text in blocks separated by a line that starts with -- -...
ajushi
Yea, I got it now, misunderstood the task.
Maiku Mori
Now I get 'IndexError: list index out of range'
Mark Byers
Ehh ... I should just go sleep instead of surfing SO =(
Maiku Mori
@Maiku don't give up, you can do it (maybe tomorrow) :)
ajushi
It should be working =)
Maiku Mori
+2  A: 

You might find it easier to solve this using itertools.groupby.

def get_text_blocks(filename):
    import itertools
    with open(filename,'r') as f:
        groups = itertools.groupby(f, lambda line:line.startswith('-- -'))
        return [''.join(lines) for is_separator, lines in groups if not is_separator]

Another alternative is to use a regular expression to match the separators:

def get_text_blocks(filename):
    import re
    seperator = re.compile('^-- -.*', re.M)
    with open(filename,'r') as f:
        return re.split(seperator, f.read())
Mark Byers
Interesting answers Mark. I didn't know about itertools, thanks.
ajushi
+1 For RegEx version, the itertools version is slightly cryptic.
Maiku Mori
I tried the itertools version on the ineractive interpreter and it returns an empty string. lines seems to be an itertools._grouper object
ajushi
It's unlikely to return an empty string. It always returns a list. You must have a copy/paste error.
Mark Byers
Sorry my bad, an empty list I mean.
ajushi
Well all I can say is that it works here for the files I tested it on. Maybe you gave it an empty file, or a file where every line was a separator? I can't really explain it without more details. You can just use the regex method (the second alternative) if you can't get the first working (though I suspect that whatever you are doing wrong with the first method will also cause problems with the second).
Mark Byers
You're right the file object is empty because I've iterated through it, may bad again. Anyway thank you for itertools :)
ajushi
A: 

This is the standard problem with emitting buffers.

You don't detect EOF -- that's needless. You write the last buffer.

def get_text_blocks(filename):
    text_blocks = []
    text_block = StringIO.StringIO()
    with open(filename, 'r') as f:
        for line in f:
            text_block.write(line)
            print line
            if line.startswith('-- -'):
                text_blocks.append(text_block.getvalue())
                text_block.close()
                text_block = StringIO.StringIO()
         ### At this moment, you are at EOF
         if len(text_block) > 0:
             text_blocks.append( text_block.getvalue() )
         ### Now your final block (if any) is appended.
    return text_blocks
S.Lott