ansaurus

Question

Split a list of dates by another list of dates

Answer 1

+1 A:

The python bisect module will find the correct index for you, and you can deduct the number of items before and after.

If I'm understanding right, that would be O(dates) * O(log(seen))

Edit 1

It should be possible to do in one pass, just like SilentGhost demonstrates. However,itertools.groupby works fine with sorted data, it should be able to do something here, perhaps like this (this is more than O(n) but could be improved):

import itertools

# numbers are easier to make up now
seen = [-1, 10, 12, 15, 20, 75]
dates = [5, 15, 25, 50, 100]

def finddate(s, dates):
    """Find the first date in @dates larger than s"""
    for date in dates:
        if s < date:
            break
    return date


for date, group in itertools.groupby(seen, key=lambda s: finddate(s, dates)):
    print date, list(group)

kaizer.se 2009-09-02 16:53:04

Answer 2

+1 A:

this generator traverses the list only once:

def get_alive(seen, dates):
    c = len(seen)
    for date in dates:
     for s in seen[-c:]:
      if s >= date:      # replaced your > for >= as it seems to make more sense
       yield c
       break
      else:
       c -= 1

SilentGhost 2009-09-02 16:59:53

but for every date, `seen[-c:]` makes a copy of the remaining list

THC4k 2009-09-02 17:07:08

so? did you timed it and find out that it's slower?

SilentGhost 2009-09-02 17:17:09

point being that I have timed it and it seems to be the fastest solution, but I'm eager to here the opposite.

SilentGhost 2009-09-02 17:26:18

@SilentGhost, x is undefined.

Nadia Alramli 2009-09-02 17:30:48

Thanks, Nadia, fixed.

SilentGhost 2009-09-02 17:33:01

+1 I think you solution is the better one

Nadia Alramli 2009-09-02 17:45:17

Answer 3

A:

I took SilentGhosts generator solution a bit further using explicit iterators. This is the linear time solution i was thinking of.

def splitter( items, breaks ):
    """ assuming `items` and `breaks` are sorted """
    c = len( items )

    items = iter(items)
    item = items.next()
    breaks = iter(breaks)
    breaker = breaks.next()

    while True:
        if breaker > item:
            for it in items:
                c -= 1
                if it >= breaker:
                    item = it
                    yield c
                    break
            else:# no item left that is > the current breaker
                yield 0 # 0 items left for the current breaker
                # and 0 items left for all other breaks, since they are > the current
                for _ in breaks:
                    yield 0 
                break # and done
        else:
            yield c
            for br in breaks:
                if br > item:
                    breaker = br
                    break
                yield c
            else:
                # there is no break > any item in the list
                break

THC4k 2009-09-02 18:45:35

@SilentGhosts's solution was really neat and short I don't think the performance gain if any is big enough to prefer a less readable solution.

Nadia Alramli 2009-09-02 18:48:59

The problem with @SilentGhosts's is that it is O(n**2) too. Its easy to see with `seen = range( 0,n )` and `dates = [-1]` .. c never changes and therefore it makes n copies of the n items in seen

THC4k 2009-09-02 19:04:31

ansaurus

tags:

views:

answers:

Split a list of dates by another list of dates

related questions