views:

134

answers:

3

I have a number of nodes in a network. The nodes send status information every hour to indicate that they are alive. So i have a list of Nodes and the time when they were last alive. I want to graph the number of alive nodes over the time.

The list of nodes is sorted by the time they were last alive but i cant figure out a nice way to count how many are alive at a each date.

from datetime import datetime, timedelta

seen = [ n.last_seen for n in c.nodes ] # a list of datetimes
seen.sort()
start = seen[0]
end = seen[-1]

diff = end - start
num_points = 100

step = diff / num_points

num = len( c.nodes )

dates = [ start + i * step for i in range( num_points ) ]

What i want is basically

alive = [ len([ s for s in seen if s > date]) for date in dates ]

but thats not really efficient. The solution should use the fact that the seen list is sorted and not loop over the whole list for every date.

+1  A: 

The python bisect module will find the correct index for you, and you can deduct the number of items before and after.

If I'm understanding right, that would be O(dates) * O(log(seen))


Edit 1

It should be possible to do in one pass, just like SilentGhost demonstrates. However,itertools.groupby works fine with sorted data, it should be able to do something here, perhaps like this (this is more than O(n) but could be improved):

import itertools

# numbers are easier to make up now
seen = [-1, 10, 12, 15, 20, 75]
dates = [5, 15, 25, 50, 100]

def finddate(s, dates):
    """Find the first date in @dates larger than s"""
    for date in dates:
        if s < date:
            break
    return date


for date, group in itertools.groupby(seen, key=lambda s: finddate(s, dates)):
    print date, list(group)
kaizer.se
+1  A: 

this generator traverses the list only once:

def get_alive(seen, dates):
    c = len(seen)
    for date in dates:
     for s in seen[-c:]:
      if s >= date:      # replaced your > for >= as it seems to make more sense
       yield c
       break
      else:
       c -= 1
SilentGhost
but for every date, `seen[-c:]` makes a copy of the remaining list
THC4k
so? did you timed it and find out that it's slower?
SilentGhost
point being that I have timed it and it seems to be the fastest solution, but I'm eager to here the opposite.
SilentGhost
@SilentGhost, x is undefined.
Nadia Alramli
Thanks, Nadia, fixed.
SilentGhost
+1 I think you solution is the better one
Nadia Alramli
A: 

I took SilentGhosts generator solution a bit further using explicit iterators. This is the linear time solution i was thinking of.

def splitter( items, breaks ):
    """ assuming `items` and `breaks` are sorted """
    c = len( items )

    items = iter(items)
    item = items.next()
    breaks = iter(breaks)
    breaker = breaks.next()

    while True:
        if breaker > item:
            for it in items:
                c -= 1
                if it >= breaker:
                    item = it
                    yield c
                    break
            else:# no item left that is > the current breaker
                yield 0 # 0 items left for the current breaker
                # and 0 items left for all other breaks, since they are > the current
                for _ in breaks:
                    yield 0 
                break # and done
        else:
            yield c
            for br in breaks:
                if br > item:
                    breaker = br
                    break
                yield c
            else:
                # there is no break > any item in the list
                break
THC4k
@SilentGhosts's solution was really neat and short I don't think the performance gain if any is big enough to prefer a less readable solution.
Nadia Alramli
The problem with @SilentGhosts's is that it is O(n**2) too. Its easy to see with `seen = range( 0,n )` and `dates = [-1]` .. c never changes and therefore it makes n copies of the n items in seen
THC4k