Feb-27-2018, 08:03 AM
I have written a simple code to understand how lack of communication between the child processes leads to a random result when using
As expected because of the shared memory in multithreading, I get the correct result when I use
multiprocessing.Pool
. I input a nested dictionary as a dictproxy object made by multiprocessing.Manager
(see also the main code below):manager = Manager() my_dict = manager.dict() my_dict['nested'] = nestedinto a pool embedding 16 open processes. The nested dictionary is defined below in the main code. The function
my_function
simply generates the second power of each number stored in the elements of the nested dictionary.As expected because of the shared memory in multithreading, I get the correct result when I use
multiprocessing.dummy
:Output:{0: 1, 1: 4, 2: 9, 3: 16}
{0: 4, 1: 9, 2: 16, 3: 25}
{0: 9, 1: 16, 2: 25, 3: 36}
{0: 16, 1: 25, 2: 36, 3: 49}
{0: 25, 1: 36, 2: 49, 3: 64}
but when I use multiprocessing
, the result is incorrect and completely random in each run. One example of the incorrect result is:Output:{0: 1, 1: 2, 2: 3, 3: 4}
{0: 4, 1: 9, 2: 16, 3: 25}
{0: 3, 1: 4, 2: 5, 3: 6}
{0: 16, 1: 25, 2: 36, 3: 49}
{0: 25, 1: 36, 2: 49, 3: 64}
In this particular run, the 'data'
in Output:'element'
1 and 3 was not updated. I understand that this happens due to the lack of communication between the child processes which prohibits the "updated" nested dictionary in each child process to be properly sent to the others. However, is it possible to use Manager.Queue
to organize this inter-child communication and get the correct results possibly with minimal run-time?from multiprocessing import Pool, Manager import numpy as np def my_function(A): arg1 = A[0] my_dict = A[1] temporary_dict = my_dict['nested'] for arg2 in np.arange(len(my_dict['nested']['elements'][arg1]['data'])): temporary_dict['elements'][arg1]['data'][arg2] = temporary_dict['elements'][arg1]['data'][arg2] ** 2 my_dict['nested'] = temporary_dict if __name__ == '__main__': # nested dictionary definition strs1 = {} strs2 = {} strs3 = {} strs4 = {} strs5 = {} strs1['data'] = {} strs2['data'] = {} strs3['data'] = {} strs4['data'] = {} strs5['data'] = {} for i in [0,1,2,3]: strs1['data'][i] = i + 1 strs2['data'][i] = i + 2 strs3['data'][i] = i + 3 strs4['data'][i] = i + 4 strs5['data'][i] = i + 5 nested = {} nested['elements'] = [strs1, strs2, strs3, strs4, strs5] nested['names'] = ['series1', 'series2', 'series3', 'series4', 'series5'] # parallel processing pool = Pool(processes = 16) manager = Manager() my_dict = manager.dict() my_dict['nested'] = nested sequence = np.arange(len(my_dict['nested']['elements'])) pool.map(my_function, ([seq,my_dict] for seq in sequence)) pool.close() pool.join() # printing the data in all elements of the nested dictionary print(my_dict['nested']['elements'][0]['data']) print(my_dict['nested']['elements'][1]['data']) print(my_dict['nested']['elements'][2]['data']) print(my_dict['nested']['elements'][3]['data']) print(my_dict['nested']['elements'][4]['data'])