tags:

views:

805

answers:

6

Disclaimer: I'm fairly new to python!

If I want all the lines of a file until (edit: and including) the line containing some string stopterm, is there a way of using the list syntax for it? I was hoping there would be something like:

usefullines = [line for line in file until stopterm in line]

For now, I've got

usefullines = []
for line in file:
    usefullines.append(line)
    if stopterm in line:
        break

It's not the end of the world, but since there rest of Python syntax is so straightforward, I was hoping for a 1 thought->1 Python line mapping.

+1  A: 

Forget this

Leaving the answer, but marking it community. See Stewen Huwig's answer for the correct way to do this.


Well, [x for x in enumerable] will run until enumerable doesn't produce data any more, the if-part will simply allow you to filter along the way.

What you can do is add a function, and filter your enumerable through it:

def enum_until(source, until_criteria):
    for k in source:
        if until_criteria(k):
            break;
        yield k;

def enum_while(source, while_criteria):
    for k in source:
        if not while_criteria(k):
            break;
        yield k;

l1 = [k for k in enum_until(xrange(1, 100000), lambda y: y == 100)];
l2 = [k for k in enum_while(xrange(1, 100000), lambda y: y < 100)];
print l1;
print l2;

Of course, it doesn't look as nice as what you wanted...

Lasse V. Karlsen
That's a lot of work reimplementing the itertools module in the standard library...
Steven Huwig
Sure is! NIH in the works!
Lasse V. Karlsen
I bet you've had to do this for JavaScript, right? I know I have, in the cases where third-party libraries are not permitted...
Steven Huwig
+4  A: 

" I was hoping for a 1 thought->1 Python line mapping." Wouldn't we all love a programming language that somehow mirrored our natural language?

You can achieve that, you just need to define your unique thoughts once. Then you have the 1:1 mapping you were hoping for.

def usefulLines( aFile ):
    for line in aFile:
        yield line
        if line == stopterm:
            break

Is pretty much it.

for line in usefulLines( aFile ):
    # process a line, knowing it occurs BEFORE stopterm.

There are more general approaches. The lassevk answers with enum_while and enum_until are generalizations of this simple design pattern.

S.Lott
+10  A: 
from itertools import takewhile
usefullines = takewhile(lambda x: not re.search(stopterm, x), lines)

from itertools import takewhile
usefullines = takewhile(lambda x: stopterm not in x, lines)

Here's a way that keeps the stopterm line:

def useful_lines(lines, stopterm):
    for line in lines:
        if stopterm in line:
            yield line
            break
        yield line

usefullines = useful_lines(lines, stopterm)
# or...
for line in useful_lines(lines, stopterm):
    # ... do stuff
    pass
Steven Huwig
You can use x.find(stopterm) instead if this is just matching a string
Mapad
or indeed, stopterm (not) in x as the original question has it.
Steven Huwig
Wow, didn't know this one. Of course it exists, it's Python. Silly me. +1
e-satis
itertools, operator, and (C)StringIO are the unsung modules of the standard library... everyone should learn them in my opinion. :)
Steven Huwig
Wow - I'm just gonna go and delete my re example - thanks for showing me this!
Patrick Harrington
V. useful, but I'd like the line that includes stopterm, which is admittedly ambiguous in the list syntax version.Can I retrieve lines.next() after using takewhile?
Phil H
Yes, you can use lines.next() after using takewhile. I'll update my example.
Steven Huwig
On second thought you lose the stopterm entirely using takewhile. You'll probably want to write a generator function like S. Lott's answer, but one that returns the stopterm line as well.How irritating of Python. :-|
Steven Huwig
An addition is if you what to start when then stat when you can do this: linesbetween = takewhile(lambda x: x!=stop_at, list(dropwhile(lambda y: y!=start_at, lines)))
Vincent
+1  A: 

I think it's fine to keep it that way. Sophisticated one-liner are not really pythonic, and since Guido had to put a limit somewhere, I guess this is it...

e-satis
+2  A: 

That itertools solution is neat. I have earlier been amazed by itertools.groupby, one handy tool.

But still i was just tinkering if I could do this without itertools. So here it is (There is one assumption and one drawback though: the file is not huge and its goes for one extra complete iteration over the lines, respectively.)

I created a sample file named "try":

hello
world
happy
day
bye

once you read the file and have the lines in a variable name lines:

lines=open('./try').readlines()

then

    print [each for each in lines if lines.index(each)<=[lines.index(line) for line in lines if 'happy' in line][0]]

gives the result:

['hello\n', 'world\n', 'happy\n']

and

print [each for each in lines if lines.index(each)<=[lines.index(line) for line in lines if 'day' in line][0]]

gives the result:

['hello\n', 'world\n', 'happy\n', 'day\n']

So you got the last line - the stop term line also included.

JV
A: 

I'd go with Steven Huwig's or S.Lott's solutions for real usage, but as a slightly hacky solution, here's one way to obtain this behaviour:

def stop(): raise StopIteration()

usefullines = list(stop() if stopterm in line else line for line in file)

It's slightly abusing the fact that anything that raises StopIteration will abort the current iteration (here the generator expression) and uglier to read than your desired syntax, but will work.

Brian