Python Forum
How to use pool.map effectively?
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
How to use pool.map effectively?
#1
Python 3.7.3

if self.multiprocessing:
    quantity_process = multiprocessing.cpu_count()
else:
    quantity_process = 1
         
pool = multiprocessing.Pool(processes=quantity_process, initializer=initializer(), maxtasksperchild=1)
output_data = pool.map(count, self.input_data)
pool.close()
pool.join()
On 4 cores, I launched about 150 tasks using pool.map, but over time I saw in the system monitor that python processes are gradually ending, although the remaining processes continue to receive new tasks (as can be seen from the debugging information). In other words, tasks are not divided on the fly, but at the beginning pool.map, and therefore some processes get easier tasks and are performed faster, leaving all the difficult work to their "unsuccessful" ones.

How to avoid the fact, that not all process of pool work part time?
Reply
#2
you may want to take a look at: https://pymotw.com/3/multiprocessing/

specifically: https://pymotw.com/3/multiprocessing/com...cess-pools
Reply
#3
(Oct-06-2019, 12:00 PM)Larz60+ Wrote: you may want to take a look at: https://pymotw.com/3/multiprocessing/

specifically: https://pymotw.com/3/multiprocessing/com...cess-pools

Quote:By default, Pool creates a fixed number of worker processes and passes jobs to them until there are no more jobs.

This is not true! Unfulfilled tasks continue to remain, but the pool does not issue them to some processes, and they complete their work. At first, 4 processes worked for me, then only 3, then 2, and in the end all the work was done by one process. At the same time, I constantly received messages about the launch of new tasks.
Reply
#4
Quote:By default, Pool creates a fixed number of worker processes and passes jobs to them until there are no more jobs.
this is a comment from Doug Hellmann, if incorrect, it should be directed to him at https://doughellmann.com, not me.
Reply
#5
from concurrent.futures import ProcessPoolExecutor

with ProcessPoolExecutor() as executor:
    results = executor.map(my_func, args=args_list)
multiprocessing.cpu_count() gives the number of the CPU cores. However, if you use all of them it can affect the whole system.
Map method will use all available cores.

I have somewhere a script which downloads in parallel more than 200 web pages and I tested it on 4 core system with a different number of processes. The best time I got was with around 30 processes set. If I found it, I can run the script here again to see what is going on and share it here. This system has 6 cores though.
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#6
(Oct-06-2019, 06:48 PM)wavic Wrote: I have somewhere a script which downloads in parallel more than 200 web pages and I tested it on 4 core system with a different number of processes. The best time I got was with around 30 processes set. If I found it, I can run the script here again to see what is going on and share it here. This system has 6 cores though.

I have no I / O limit, because the speed of the algorithm does not depend on the use of a HDD or SSD (in the C- program I had 4x acceleration on SSD). Therefore, starting processes more than the number of cores does not make sense.
Reply
#7
If you use multiprocessing, you must know that all messages from one process to the other have to be pickled by python. This costs time. So if you have a small amount of data, which needs much time to calculate, you're good with multiprocessing.

If the task has too much data, you'll lose speed because of communication overhead.
Are you using numpy and pandas? If it's the case, look for https://dask.org/

BTW: For a prallel download task (only io) a single process is enough. You should use asyncio for this task.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply
#8
(Oct-07-2019, 08:02 AM)AlekseyPython Wrote: Therefore, starting processes more than the number of cores does not make sense.

I was thinking just like you. But when I set it to 4 or 8 cores it's doing it slower. I don't know why and can't even think about it but I saw it with my own eyes. Huh
"As they say in Mexico 'dosvidaniya'. That makes two vidaniyas."
https://freedns.afraid.org
Reply
#9
(Oct-07-2019, 08:20 AM)DeaD_EyE Wrote: BTW: For a prallel download task (only io) a single process is enough. You should use asyncio for this task.

In my case, the bottleneck is not I / O, because increasing the disk speed by 4 times (HDD -> SSD) didn't lead to acceleration. Everything depends on the speed of the CPU for converting the data, which read from the csv- file. Therefore, asyncio isn't suitable (this will reduce performance by 4 times, since I have 4 cores).
Reply
#10
Ok, converting data is cpu intensive.
Almost dead, but too lazy to die: https://sourceserver.info
All humans together. We don't need politicians!
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Multiprocessing Pool Multiple Instances How to Kill by Pool ID sunny9495 0 737 Nov-16-2022, 05:57 AM
Last Post: sunny9495
  Pool multiprocessing - know current status in loop? korenron 0 1,604 Jul-28-2021, 08:49 AM
Last Post: korenron
  pool mysql error - not catch by try\except? korenron 1 2,101 Jul-05-2021, 11:26 AM
Last Post: ibreeden
  Process (pool,map) strange behaviour maverick76 1 1,915 Feb-03-2020, 02:43 PM
Last Post: maverick76
  running multiple commands in a parallel pool Skaperen 6 3,875 Jul-30-2019, 05:49 AM
Last Post: Skaperen
  pool map cycle skorost5 5 3,755 Apr-07-2019, 09:21 AM
Last Post: skorost5
  mysql connection pool? MuntyScruntfundle 0 1,996 Oct-20-2018, 07:36 PM
Last Post: MuntyScruntfundle
  How to run different processes in a pool of 5 lravikumarvsp 0 2,340 May-29-2018, 09:43 AM
Last Post: lravikumarvsp
  How to restart the processes in the pool after the pool got completed lravikumarvsp 0 3,852 May-27-2018, 02:18 PM
Last Post: lravikumarvsp

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020