python - Numpy multiplication of vectors of different size avoiding for loops -
i have matrix, say, p
of size (x,y)
. also, have 2 matrices, say, kx
, ky
of size (m,n)
both, matrix pk
of size (m,n)
, 2 vectors u
, v
of x
, y
respectively. example, can defined follows:
import numpy np p = np.zeros((x,y)); pk = np.random.rand(m,n); kx = np.random.rand(m,n); ky = np.random.rand(m,n); u = np.random.rand(x); v = np.random.rand(y);
in actual code not random, of course, shall not matter example. question is, if there exists pure numpy equivalent following:
for m in range(0, m): n in range(0, n): in range(0,x): j in range(0,y): arg = kx[m,n]*u[i] + ky[m,n]*v[j]; p[i,j] += pk[m,n]*np.cos(arg);
all m
,n
,x
,y
different, x
, y
can same if solution not exist otherwise.
a common strategy eliminating for-loop
s in numpy calculations work higher-dimensional arrays.
consider example, line
arg = kx[m,n]*u[i] + ky[m,n]*v[j]
this line depends on indices m
, n
, i
, j
. so arg
depends on m
, n
, i
, j
. means arg
can thought of 4-dimensional array indexed m
, n
, i
, j
. can eliminate 4 for-loops
-- far arg
concerned -- computing
kxu = kx[:,:,np.newaxis]*u kyv = ky[:,:,np.newaxis]*v arg = kxu[:,:,:,np.newaxis] + kyv[:,:,np.newaxis,:]
kx[:,:,np.newaxis]
has shape (m, n, 1)
, , u
has shape (x,)
. multiplying them uses numpy broadcasting create array of shape (m, n, x)
. thus, above, new axes used placeholders, arg
ends 4 axes indexed m
,n
,i
,j
in order.
similarly, p
can defined
p = (pk[:,:,np.newaxis,np.newaxis]*np.cos(arg)).sum(axis=0).sum(axis=0)
the sum(axis=0)
(called twice) sums along m
, n
axes, p
ends being 2-dimensional array indexed i
, j
only.
by working these 4-dimensional arrays, apply numpy operations on whole numpy arrays. in contrast, when using 4 for-loops
, had computations value-by-value on scalars. consider example np.cos(arg)
doing when arg
4-dimensional array. off-loads computation of all cosines in 1 numpy function call underlying loop in compiled c code. much faster calling np.cos
once each scalar. reason why working higher-dimensional arrays ends being faster for-loop
-based code.
import numpy np def orig(kx, ky, u, v, pk): m, n = kx.shape x = u.size y = v.size p = np.empty((x, y), dtype=pk.dtype) m in range(0, m): n in range(0, n): in range(0,x): j in range(0,y): arg = kx[m,n]*u[i] + ky[m,n]*v[j] p[i,j] += pk[m,n]*np.cos(arg) return p def alt(kx, ky, u, v, pk): kxu = kx[:,:,np.newaxis]*u kyv = ky[:,:,np.newaxis]*v arg = kxu[:,:,:,np.newaxis] + kyv[:,:,np.newaxis,:] p = (pk[:,:,np.newaxis,np.newaxis]*np.cos(arg)).sum(axis=0).sum(axis=0) return p m, n = 10, 20 x, y = 5, 15 kx = np.random.random((m, n)) ky = np.random.random((m, n)) u = np.random.random(x) v = np.random.random(y) pk = np.random.random((m, n))
sanity check, (showing alt , orig return same result):
in [57]: p2 = alt(kx, ky, u, v, pk) in [58]: p1 = orig(kx, ky, u, v, pk) in [59]: np.allclose(p1, p2) out[59]: true
a benchmark, showing alt
faster orig
:
in [60]: %timeit orig(kx, ky, u, v, pk) 10 loops, best of 3: 33.6 ms per loop in [61]: %timeit alt(kx, ky, u, v, pk) 1000 loops, best of 3: 349 µs per loop
Comments
Post a Comment