python - Seeding random number generators in parallel programs -


i studing multiprocessing module of python. have 2 cases:

ex. 1

def foo(nbr_iter):     step in xrange(int(nbr_iter)) :         print random.uniform(0,1) ...  multiprocessing import pool  if __name__ == "__main__":     ...     pool = pool(processes=nmr_parallel_block)     pool.map(foo, nbr_trial_per_process) 

ex 2. (using numpy)

 def foo_np(nbr_iter):      np.random.seed()      print np.random.uniform(0,1,nbr_iter) 

in both cases random number generators seeded in forked processes.

why have seeding explicitly in numpy example, not in python example?

if no seed provided explicitly, numpy.random seed using os-dependent source of randomness. use /dev/urandom on unix-based systems (or windows equivalent), if not available reason seed wall clock. since self-seeding occurs @ time when new subprocess forks, possible multiple subprocesses inherit same seed if forked @ same time, leading identical random variates being produced different subprocesses.

often correlates number of concurrent threads running. example:

import numpy np import random multiprocessing import pool  def foo_np(seed=none):     # np.random.seed(seed)     return np.random.uniform(0, 1, 5)  pool = pool(processes=8) print np.array(pool.map(foo_np, xrange(20)))  # [[ 0.14463001  0.80273208  0.5559258   0.55629762  0.78814652] <- #  [ 0.14463001  0.80273208  0.5559258   0.55629762  0.78814652] <- #  [ 0.14463001  0.80273208  0.5559258   0.55629762  0.78814652] <- #  [ 0.14463001  0.80273208  0.5559258   0.55629762  0.78814652] <- #  [ 0.14463001  0.80273208  0.5559258   0.55629762  0.78814652] <- #  [ 0.14463001  0.80273208  0.5559258   0.55629762  0.78814652] <- #  [ 0.14463001  0.80273208  0.5559258   0.55629762  0.78814652] <- #  [ 0.64672339  0.99851749  0.8873984   0.42734339  0.67158796] #  [ 0.64672339  0.99851749  0.8873984   0.42734339  0.67158796] #  [ 0.64672339  0.99851749  0.8873984   0.42734339  0.67158796] #  [ 0.64672339  0.99851749  0.8873984   0.42734339  0.67158796] #  [ 0.64672339  0.99851749  0.8873984   0.42734339  0.67158796] #  [ 0.11283279  0.28180632  0.28365286  0.51190168  0.62864241] #  [ 0.11283279  0.28180632  0.28365286  0.51190168  0.62864241] #  [ 0.28917586  0.40997875  0.06308188  0.71512199  0.47386047] #  [ 0.11283279  0.28180632  0.28365286  0.51190168  0.62864241] #  [ 0.64672339  0.99851749  0.8873984   0.42734339  0.67158796] #  [ 0.11283279  0.28180632  0.28365286  0.51190168  0.62864241] #  [ 0.14463001  0.80273208  0.5559258   0.55629762  0.78814652] <- #  [ 0.11283279  0.28180632  0.28365286  0.51190168  0.62864241]] 

you can see groups of 8 threads simultaneously forked same seed, giving me identical random sequences (i've marked first group arrows).

calling np.random.seed() within subprocess forces thread-local rng instance seed again /dev/urandom or wall clock, (probably) prevent seeing identical output multiple subprocesses. best practice explicitly pass different seed (or numpy.random.randomstate instance) each subprocess, e.g.:

def foo_np(seed=none):     local_state = np.random.randomstate(seed)     print local_state.uniform(0, 1, 5)  pool.map(foo_np, range(20)) 

i'm not entirely sure underlies differences between random , numpy.random in respect (perhaps has different rules selecting source of randomness self-seed compared numpy.random?). still recommend explicitly passing seed or random.random instance each subprocess on safe side. use .jumpahead() method of random.random designed shuffling states of random instances in multithreaded programs.


Comments

Popular posts from this blog

shopping cart - Page redirect not working PHP -

php - How to modify a menu to show sub-menus -

python - Installing PyDev in eclipse is failed -