ansaurus

Question

find length of sequences of identical values in a numpy array

Answer 1

A:

durations = []
counter   = 0

for bool in b:
    if bool:
        counter += 1
    elif counter > 0:
        durations.append(counter)
        counter = 0

if counter > 0:
    durations.append(counter)

John Kugelman 2009-07-01 00:31:02

sure, this is more consise, but just as inefficient ; what I want to do is move the loop down to the C layer, by means of using some clever combination of numpy calls...

Gyom 2009-07-01 01:06:14

check my edited answer, I now offer one such "clever combinations" (always trying hard not to be TOO clever though;-) -- but, do measure the speed of that one AND the itertools.groupby-based solution, and let us know which one is faster (and by how much) in examples realistic-for-you!

Alex Martelli 2009-07-01 01:44:13

Answer 2

+2 A:

While not numpy primitives, itertools functions are often very fast, so do give this one a try (and measure times for various solutions including this one, of course):

def runs_of_ones(bits):
  for bit, group in itertools.groupby(bits):
    if bit: yield sum(group)

If you do need the values in a list, just can use list(runs_of_ones(bits)), of course; but maybe a list comprehension might be marginally faster still:

def runs_of_ones_list(bits):
  return [sum(g) for b, g in itertools.groupby(bits) if b]

Moving to "numpy-native" possibilities, what about:

def runs_of_ones_array(bits):
  # make sure all runs of ones are well-bounded
  bounded = numpy.hstack(([0], bits, [0]))
  # get 1 at run starts and -1 at run ends
  difs = numpy.diff(bounded)
  run_starts, = numpy.where(difs > 0)
  run_ends, = numpy.where(difs < 0)
  return run_ends - run_starts

Again: be sure to benchmark solutions against each others in realistic-for-you examples!

Alex Martelli 2009-07-01 01:04:34

Hmmmmm... that last one looks familiar. ;)

gnovice 2009-07-01 02:00:29

Thanks a lot !The diff/where solution is exactly what I had in mind (not to mention it is about 10 times faster than the other solutions).Call that "not too clever" if you like, but I wish I was clever enough to come up with it :-)

Gyom 2009-07-01 02:40:04

@gnovice, I don't do matlab (funny enough my daughter, now a PhD candidate in advanced radio engineering, does;-), but now looking at your answer I do see the analogies -- get the end-of-runs minus the start-of-runs, get those by locating <0 and >0 spot in the differences, and pad the bits with zeros to make sure all runs-of-ones do end. Guess there aren't that many ways to skin this "run lengths" problem!-)

Alex Martelli 2009-07-01 02:50:30

@Gyom, you're welcome -- as @gnovice hints, the matlab solution is also similar, or so I guess it would be if one knew matlab -- so it must be that neither is very clever;-)... it's more a question of having had to do run-length coding stuff before (most of the time in my edit was about translating from Numeric, which is what I still tend instinctively to turn to, to much-better numpy -- but where I actually first learned such things was with APL, 30 years ago, when I was still a hardware designer...!-).

Alex Martelli 2009-07-01 02:54:09

Answer 3

+2 A:

Just in case anyone is curious (and since you mentioned MATLAB in passing), here's one way to solve it in MATLAB:

threshold = 7;
d = 10*rand(1,100000);  % Sample data
b = diff([false (d < threshold) false]);
durations = find(b == -1)-find(b == 1);

I'm not too familiar with Python, but maybe this could help give you some ideas. =)

gnovice 2009-07-01 01:15:16

thanks for this answer as well, this is exactly the kind of stuff I was looking for

Gyom 2009-07-01 02:41:19

diff() exists in numpy too, so this is more or less what you want though replace find(foo) with where(foo)[0].

dwf 2009-07-24 15:29:26

Answer 4

A:

Here is a solution using only arrays: it takes an array containing a sequence of bools and counts the length of the transitions.

>>> from numpy import array, arange
>>> b = array([0,0,0,1,1,1,0,0,0,1,1,1,1,0,0], dtype=bool)
>>> sw = (b[:-1] ^ b[1:]); print sw
[False False  True False False  True False False  True False False False
  True False]
>>> isw = arange(len(sw))[sw]; print isw
[ 2  5  8 12]
>>> lens = isw[1::2] - isw[::2]; print lens
[3 4]

sw contains a true where there is a switch, isw converts them in indexes. The items of isw are then subtracted pairwise in lens.

Notice that if the sequence started with an 1 it would count the length of the 0s sequences: this can be fixed in the indexing to compute lens. Also, I have not tested corner cases such sequences of length 1.

piro 2009-07-01 10:32:35

ansaurus

tags:

views:

answers:

find length of sequences of identical values in a numpy array

related questions