views:

87

answers:

4

I'm trying to join a set of sentences contained in a list. I have a function which determines whether a sentence in worth saving. However, in order to keep the context of the sentence I need to also keep the sentence before and after it. In the edge cases, where its either the first or last sentence then, I'll just keep the sentence and its only neighbor.

An example is best:

    ex_paragraph = ['The quick brown fox jumps over the fence.', 
                   'Where there is another red fox.', 
                   'They run off together.', 
                   'They live hapily ever after.']
    t1 = lambda x: x.startswith('Where')
    t2 = lambda x: x'startswith('The ')

The result for t1 should be:

['The quick brown fox jumps over the fence. Where there is another red fox. They run off together.']

The result for t2 should be:

['The quick brown fox jumps over the fence. Where there is another red fox.']

My solution is:

def YieldContext(sent_list, cont_fun):
    def JoinSent(sent_list, ind):
        if ind == 0:
            return sent_list[ind]+sent_list[ind+1]
        elif ind == len(sent_list)-1:
            return sent_list[ind-1]+sent_list[ind]
        else:

            return ' '.join(sent_list[ind-1:ind+1])


    for sent, sentnum in izip(sent_list, count(0)):
        if cont_fun(sent):
            yield JoinSent(sent_list, sent_num)

Does anyone know a "cleaner" or more pythonic way to do something like this. The if-elif-else seems a little forced.

Thanks,

Will

PS. I'm obviously doing this with a more complicated "context-function" but this is just for a simple example.

+1  A: 

I might do something like this:

from itertools import izip, tee
prev, this, next = tee([''] + ex_paragraph + [''], 3)
this.next()
next.next()
next.next()
[' '.join(ctx).strip() for ctx in izip(prev, this, next) if cont_fun(this)]

where cont_fun is one of t1 or t2.

David Zaslavsky
That's cool, I was wondering if there was an `itertools` based solution. If my paragraphs were very long this would nice sine I'm lazy-loading them from a file.
JudoWill
+1  A: 

Well, I'd probably only write iffs on oneline:

def join_send(sent_list, ind):
    items = [sent_list[i] for i in (ind-1, ind, ind+1) if i >= 0 and i < len(sent_list)]
    return ' '.join(items)

But it's probably disputable if this is readable.

P.S.: What would be certainly more pythonic would be to use PEP 8-style names, i.e. yield_context. Or hey, even yieldContext would be feasable in some libraries...but YieldContext?

Almad
Wow, that's a pretty impressive list-comprehension. I can understand it now but it I'm not sure if I'd be able to understand it in 3-months if I ever have to refactor it.
JudoWill
Depends - I found them readable and actually like them more then lot of functools/itertools things, but I fully accept that it may get ugly and is not a looking readable for a lot of people.
Almad
+4  A: 

Appending an empty string to the beginning and end of the list is actually a good idea, but there is actually no need to use fancy list comprehension or such. You can build a generator very easily:

def yieldContext(l, func):
    l = [''] + l + ['']
    for i, s in enumerate(l):
        if func(s):
            yield ' '.join(l[i-1:i+2]).strip()

gives:

>>> print list(yieldContext(ex_paragraph, t1))
['The quick brown fox jumps over the fence. Where there is another red fox. They run off together.']

>>> print list(yieldContext(ex_paragraph, t2))
['The quick brown fox jumps over the fence. Where there is another red fox.']

(If you really want to create a list, there is not much of a difference. It depends mostly on how many sentences you have and what you want to do with the "context")

def yieldContext(l, func):
    l = [''] + l + ['']
    return [' '.join(l[i-1:i+2]).strip() for i, s in enumerate(l) if func(s)]
Felix Kling
I thought of that, although I got the impression JudoWill was just going to use the generator to make a list...
David Zaslavsky
@David Zaslavsky: Well, it depends on the data he has. And there is not much of a difference anyway...
Felix Kling
nice and readable ... exactly what I was looking for!
JudoWill
A: 
return ' '.join(sent_list[max(0,ind-1):min(len(ind),ind+2)])
Steven D. Majewski