numpy - Python multiprocessing (joblib) best way for argument passing -


i've noticed huge delay when using multiprocessing (with joblib). here simplified version of code:

import numpy np joblib import parallel, delayed  class matcher(object):     def match_all(self, arr1, arr2):         args = ((elem1, elem2) elem1 in arr1 elem2 in arr2)          results = parallel(n_jobs=-1)(delayed(_parallel_match)(self, e1, e2) e1, e2 in args)         # ...      def match(self, i1, i2):         return i1 == i2  def _parallel_match(m, i1, i2):     return m.match(i1, i2)  matcher = matcher() matcher.match_all(np.ones(250), np.ones(250)) 

so if run shown above, takes 30 secs complete , use 200mb. if change parameter n_jobs in parallel , set 1 takes 1.80 secs , barely use 50mb...

i suppose has related way pass arguments, haven't found better way it...

i'm using python 2.7.9

i have re-written code without using joblib library , works supposed work, although not "beautiful" code:

import itertools import multiprocessing import numpy np   class matcher(object):     def match_all(self, a1, a2):         args = ((elem1, elem2) elem1 in a1 elem2 in a2)         args = zip(itertools.repeat(self), args)          pool = multiprocessing.pool()         results = np.fromiter(pool.map(_parallel_match, args))         # ...      def match(self, i1, i2):         return i1 == i2  def _parallel_match(*args):     return args[0][0].match(*args[0][1:][0])  matcher = matcher()  matcher.match_all(np.ones(250), np.ones(250)) 

this version works charm, , takes 0.58 secs complete...

so, why isn't working @ joblib? can't understand it, guess joblib making copies of whole array every single process...


Comments

Popular posts from this blog

asp.net mvc - SSO between MVCForum and Umbraco7 -

Python Tkinter keyboard using bind -

ubuntu - Selenium Node Not Connecting to Hub, Not Opening Port -