python - Numpy multiplication of vectors of different size avoiding for loops -
i have matrix, say, p of size (x,y). also, have 2 matrices, say, kx , ky of size (m,n) both, matrix pk of size (m,n) , 2 vectors u , v of x , y respectively. example, can defined follows:
import numpy np p = np.zeros((x,y)); pk = np.random.rand(m,n); kx = np.random.rand(m,n); ky = np.random.rand(m,n); u = np.random.rand(x); v = np.random.rand(y); in actual code not random, of course, shall not matter example. question is, if there exists pure numpy equivalent following:
for m in range(0, m): n in range(0, n): in range(0,x): j in range(0,y): arg = kx[m,n]*u[i] + ky[m,n]*v[j]; p[i,j] += pk[m,n]*np.cos(arg); all m,n,x,y different, x , y can same if solution not exist otherwise.
a common strategy eliminating for-loops in numpy calculations work higher-dimensional arrays.
consider example, line
arg = kx[m,n]*u[i] + ky[m,n]*v[j] this line depends on indices m, n, i , j. so arg depends on m, n, i , j. means arg can thought of 4-dimensional array indexed m, n, i , j. can eliminate 4 for-loops -- far arg concerned -- computing
kxu = kx[:,:,np.newaxis]*u kyv = ky[:,:,np.newaxis]*v arg = kxu[:,:,:,np.newaxis] + kyv[:,:,np.newaxis,:] kx[:,:,np.newaxis] has shape (m, n, 1), , u has shape (x,). multiplying them uses numpy broadcasting create array of shape (m, n, x). thus, above, new axes used placeholders, arg ends 4 axes indexed m,n,i,j in order.
similarly, p can defined
p = (pk[:,:,np.newaxis,np.newaxis]*np.cos(arg)).sum(axis=0).sum(axis=0) the sum(axis=0) (called twice) sums along m , n axes, p ends being 2-dimensional array indexed i , j only.
by working these 4-dimensional arrays, apply numpy operations on whole numpy arrays. in contrast, when using 4 for-loops, had computations value-by-value on scalars. consider example np.cos(arg) doing when arg 4-dimensional array. off-loads computation of all cosines in 1 numpy function call underlying loop in compiled c code. much faster calling np.cos once each scalar. reason why working higher-dimensional arrays ends being faster for-loop-based code.
import numpy np def orig(kx, ky, u, v, pk): m, n = kx.shape x = u.size y = v.size p = np.empty((x, y), dtype=pk.dtype) m in range(0, m): n in range(0, n): in range(0,x): j in range(0,y): arg = kx[m,n]*u[i] + ky[m,n]*v[j] p[i,j] += pk[m,n]*np.cos(arg) return p def alt(kx, ky, u, v, pk): kxu = kx[:,:,np.newaxis]*u kyv = ky[:,:,np.newaxis]*v arg = kxu[:,:,:,np.newaxis] + kyv[:,:,np.newaxis,:] p = (pk[:,:,np.newaxis,np.newaxis]*np.cos(arg)).sum(axis=0).sum(axis=0) return p m, n = 10, 20 x, y = 5, 15 kx = np.random.random((m, n)) ky = np.random.random((m, n)) u = np.random.random(x) v = np.random.random(y) pk = np.random.random((m, n)) sanity check, (showing alt , orig return same result):
in [57]: p2 = alt(kx, ky, u, v, pk) in [58]: p1 = orig(kx, ky, u, v, pk) in [59]: np.allclose(p1, p2) out[59]: true a benchmark, showing alt faster orig:
in [60]: %timeit orig(kx, ky, u, v, pk) 10 loops, best of 3: 33.6 ms per loop in [61]: %timeit alt(kx, ky, u, v, pk) 1000 loops, best of 3: 349 µs per loop
Comments
Post a Comment