python - Numpy sum running length of non-zero values -


looking fast vectorized function returns rolling number of consecutive non-zero values. count should start on @ 0 whenever encountering zero. result should have same shape input array.

given array this:

x = np.array([2.3, 1.2, 4.1 , 0.0, 0.0, 5.3, 0, 1.2, 3.1]) 

the function should return this:

array([1, 2, 3, 0, 0, 1, 0, 1, 2]) 

this post lists vectorized approach consists of 2 steps:

  1. initialize zeros vector of same size input vector, x , set ones @ places corresponding non-zeros of x.

  2. next up, in vector, need put minus of runlengths of each island right after ending/stop positions each "island". intention use cumsum again later on, result in sequential numbers "islands" , zeros elsewhere.

here's implementation -

import numpy np  #append zeros @ start , end of input array, x xa = np.hstack([[0],x,[0]])  # array of ones , zeros, ones nonzeros of x , zeros elsewhere xa1 =(xa!=0)+0  # find consecutive differences on xa1 xadf = np.diff(xa1)  # find start , stop+1 indices , lengths of "islands" of non-zeros starts = np.where(xadf==1)[0] stops_p1 = np.where(xadf==-1)[0] lens = stops_p1 - starts  # mark indices "minus ones" put applying cumsum put_m1 = stops_p1[[stops_p1 < x.size]]  # setup vector ones nonzero x's, "minus lens" @ stops +1 & zeros elsewhere vec = xa1[1:-1] # note: change xa1, it's okay not needed anymore vec[put_m1] = -lens[0:put_m1.size]  # perform cumsum desired output out = vec.cumsum() 

sample run -

in [116]: x out[116]: array([ 0. ,  2.3,  1.2,  4.1,  0. ,  0. ,  5.3,  0. ,  1.2,  3.1,  0. ])  in [117]: out out[117]: array([0, 1, 2, 3, 0, 0, 1, 0, 1, 2, 0], dtype=int32) 

runtime tests -

here's runtimes tests comparing proposed approach against other itertools.groupby based approach -

in [21]: n = 1000000     ...: x = np.random.rand(1,n)     ...: x[x>0.5] = 0.0     ...: x = x.ravel()     ...:   in [19]: %timeit sumrunlen_vectorized(x) 10 loops, best of 3: 19.9 ms per loop  in [20]: %timeit sumrunlen_loopy(x) 1 loops, best of 3: 2.86 s per loop 

Comments

Popular posts from this blog

jquery - How do you format the date used in the popover widget title of FullCalendar? -

asp.net mvc - SSO between MVCForum and Umbraco7 -

Python Tkinter keyboard using bind -