multithreading - Python - multiprocessing for matplotlib griddata -


following former question [1], apply multiprocessing matplotlib's griddata function. possible split griddata into, 4 parts, 1 each of 4 cores? need improve performance.

for example, try code below, experimenting different values size:

import numpy np import matplotlib.mlab mlab import time  size = 500  y = np.arange(size) x = np.arange(size) x, y = np.meshgrid(x, y) u = x * np.sin(5) + y * np.cos(5) v = x * np.cos(5) + y * np.sin(5) test = x + y  tic = time.clock()  test_d = mlab.griddata(     x.flatten(), y.flatten(), test.flatten(), x+u, y+v, interp='linear')  toc = time.clock()  print 'time=', toc-tic 

i ran example code below in python 3.4.2, numpy version 1.9.1 , matplotlib version 1.4.2, on macbook pro 4 physical cpus (i.e., opposed "virtual" cpus, mac hardware architecture makes available use cases):

import numpy np import matplotlib.mlab mlab import time import multiprocessing  # value should set larger nprocs, defined later below size = 500  y = np.arange(size) x = np.arange(size) x, y = np.meshgrid(x, y) u = x * np.sin(5) + y * np.cos(5) v = x * np.cos(5) + y * np.sin(5) test = x + y  tic = time.clock()  test_d = mlab.griddata(     x.flatten(), y.flatten(), test.flatten(), x+u, y+v, interp='linear')  toc = time.clock()  print('single processor time={0}'.format(toc-tic))  # put interpolation points single array can slice xi = x + u yi = y + v # example test machine has 4 physical cpus nprocs = 4 jump = int(size/nprocs)  # enclose griddata function in wrapper communicate # output result calling process via queue def wrapper(x, y, z, xi, yi, q):     test_w = mlab.griddata(x, y, z, xi, yi, interp='linear')     q.put(test_w)  # measure elapsed time multiprocessing separately ticm = time.clock()  queue, process = [], [] n in range(nprocs):     queue.append(multiprocessing.queue())     # handle possibility size not evenly divisible nprocs     if n == (nprocs-1):         finalidx = size     else:         finalidx = (n + 1) * jump     # define arguments, dividing interpolation variables     # nprocs evenly sized slices     argtuple = (x.flatten(), y.flatten(), test.flatten(),                 xi[:,(n*jump):finalidx], yi[:,(n*jump):finalidx], queue[-1])     # create processes, , launch them     process.append(multiprocessing.process(target=wrapper, args=argtuple))     process[-1].start()  # initialize array hold return value, , make sure # null-valued of appropriate size test_m = np.asarray([[] s in range(size)]) # read individual results queues , concatenate them # return array q, p in zip(queue, process):     test_m = np.concatenate((test_m, q.get()), axis=1)     p.join()  tocm = time.clock()  print('multiprocessing time={0}'.format(tocm-ticm))  # check result of both methods same; should raise # assertionerror exception if assertion not true assert np.all(test_d == test_m) 

and got following result:

/library/frameworks/python.framework/versions/3.4/lib/python3.4/site-packages/matplotlib/tri/triangulation.py:110: futurewarning: comparison `none` result in elementwise object comparison in future.self._neighbors) single processor time=8.495998 multiprocessing time=2.249938 

i'm not sure causing "future warning" triangulation.py (evidently version of matplotlib did not input values provided question), regardless, multiprocessing does appear achieve desired speedup of 8.50/2.25 = 3.8, (edit: see comments) in neighborhood of 4x expect machine 4 cpus. , assertion statement @ end executes successfully, proving 2 methods same answer, in spite of weird warning message, believe code above valid solution.


edit: commenter has pointed out both solution, code snippet posted original author, using wrong method, time.clock(), measuring execution time; suggests using time.time() instead. think i'm coming around point of view. (digging python documentation bit further, i'm still not convinced solution 100% correct, newer versions of python appear have deprecated time.clock() in favor of time.perf_counter() , time.process_time(). regardless, agree whether or not time.time() absolutely correct way of taking measurement, it's still more correct had been using before, time.clock().)

assuming commenter's point correct, means approximately 4x speedup thought had measured in fact wrong.

however, not mean underlying code wasn't correctly parallelized; rather, means parallelization didn't in case; splitting data , running on multiple processors didn't improve anything. why be? other users have pointed out that, @ least in numpy/scipy, functions run on multiple cores, , not, , can challenging research project end-user try figure out ones which.

based on results of experiment, if solution correctly achieves parallelization within python, no further speedup observed, suggest simplest explanation matplotlib parallelizing of functions "under hood", speak, in compiled c++ libraries, numpy/scipy do. assuming that's case, correct answer question nothing further can done: further parallelizing in python no if underlying c++ libraries silently running on multiple cores begin with.


Comments

Popular posts from this blog

jquery - How do you format the date used in the popover widget title of FullCalendar? -

Bubble Sort Manually a Linked List in Java -

asp.net mvc - SSO between MVCForum and Umbraco7 -