python - Numpy sum running length of non-zero values -
looking fast vectorized function returns rolling number of consecutive non-zero values. count should start on @ 0 whenever encountering zero. result should have same shape input array.
given array this:
x = np.array([2.3, 1.2, 4.1 , 0.0, 0.0, 5.3, 0, 1.2, 3.1])
the function should return this:
array([1, 2, 3, 0, 0, 1, 0, 1, 2])
this post lists vectorized approach consists of 2 steps:
initialize zeros vector of same size input vector, x , set ones @ places corresponding non-zeros of
x
.next up, in vector, need put minus of runlengths of each island right after ending/stop positions each "island". intention use cumsum again later on, result in sequential numbers "islands" , zeros elsewhere.
here's implementation -
import numpy np #append zeros @ start , end of input array, x xa = np.hstack([[0],x,[0]]) # array of ones , zeros, ones nonzeros of x , zeros elsewhere xa1 =(xa!=0)+0 # find consecutive differences on xa1 xadf = np.diff(xa1) # find start , stop+1 indices , lengths of "islands" of non-zeros starts = np.where(xadf==1)[0] stops_p1 = np.where(xadf==-1)[0] lens = stops_p1 - starts # mark indices "minus ones" put applying cumsum put_m1 = stops_p1[[stops_p1 < x.size]] # setup vector ones nonzero x's, "minus lens" @ stops +1 & zeros elsewhere vec = xa1[1:-1] # note: change xa1, it's okay not needed anymore vec[put_m1] = -lens[0:put_m1.size] # perform cumsum desired output out = vec.cumsum()
sample run -
in [116]: x out[116]: array([ 0. , 2.3, 1.2, 4.1, 0. , 0. , 5.3, 0. , 1.2, 3.1, 0. ]) in [117]: out out[117]: array([0, 1, 2, 3, 0, 0, 1, 0, 1, 2, 0], dtype=int32)
runtime tests -
here's runtimes tests comparing proposed approach against other itertools.groupby based approach
-
in [21]: n = 1000000 ...: x = np.random.rand(1,n) ...: x[x>0.5] = 0.0 ...: x = x.ravel() ...: in [19]: %timeit sumrunlen_vectorized(x) 10 loops, best of 3: 19.9 ms per loop in [20]: %timeit sumrunlen_loopy(x) 1 loops, best of 3: 2.86 s per loop
Comments
Post a Comment