Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Parallel computation
#1
Hi there,

I'm relatively new to python and I'm trying to perform a 'trivially' parallel computation.
I have some big array of data and I need to perform many computations
using a large set of different parameters. In principle, it seems trivial since each computations is
independent of the outcome of the others... but I do not get the
speed up I was expecting. Indeed, it seems that I'm not getting any speed up.

Any suggestion about what should I use or test?
Thank you.
Reply
#2
How are you doing the parallel computation? Can you post some example code? You don't need to post the real computations, just the code that starts the parallel computation.
Reply
#3
Hi thanks,

This is how I'm doing it:
import Multiprocessing 
..........
.........

def SomeFunction()



proc_list =[]

for qproc in range(...)
      proc = Process(target=SomeFunction(...))
      proc.start()
      proc_list.appen(proc)

for proc in proc_list
      proc.join()
Reply
#4
That looks correct. How much time do you expect each process to run? How much time is it taking for multiple processes to run? How many processes are you starting?

It takes a significant amount of time to start a process, especially if you are running on windows where each process has to be spawned. (start new process, load python, call you target function). This program runs a task that takes 5 seconds to complete:
import multiprocessing as mp
import time


def SomeFunction(delay):
    time.sleep(delay)


def process_jobs():
    start = time.time()
    processes = []
    for i in range(5):
        proc = mp.Process(target=SomeFunction, args=(5,))
        processes.append(proc)
        proc.start()

    for proc in processes:
        proc.join()
    print("Total time", time.time() - start)


if __name__ == "__main__":
    process_jobs()
Output:
Total time 6.390354871749878
Starting the 5 processes takes about 1.4 seconds. If I add a few modules it takes longer.
import multiprocessing as mp
import time
import numpy as np
import pandas as pd


def SomeFunction(delay):
    time.sleep(delay)
...
Output:
Total time 7.589796304702759
Linux and Mac fork parallel processes. The child processes start out knowing everything the parent process knows, so they don't have to start python and load a bunch of modules just to provide a context to execute the target function. There is still a performance hit for starting processes, but it is smaller than on Windows. If the processing time for the task is fairly short, it might look like you are getting no benefit at all from multiprocessing.

Another thing to remember is you are limited to how many processes you can run in parallel. Running multiple processes on the same processor doesn't provide any performance benefit at all. You are essentially multi-threading instead of multi-processing. How many processor cores do you have available for multi-processing?
Reply
#5
Thank you for your detailed response.

Indeed, each process takes almost zero time, but I run many of them. I'm running them on Linux.

I might have access to fairly more cores than the number I'm using now. I wanted to get first an estimation of the speed-up that I can get before going for many cores but got surprised by the fact that instead of getting the computation faster, it got even a little slower. Probably meaning that the time needed for starting a process is longer than the process itself.

I'll do something else: I'll pack together into a single process a few that now are running independently, and I'll increase the number of cores. At some point I hope to see the benefit.

Thank you again.
Reply
#6
Read about multiprocessing pools and chunksize.

https://docs.python.org/3/library/multip...ssing.pool
https://dnmtechs.com/understanding-the-c...-python-3/
Reply
#7
Thank you again. The links were really useful
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Element wise computation divon 2 2,438 Jun-01-2022, 02:36 AM
Last Post: divon
  numpy.dot() result different with classic computation for large-size arrays geekgeek 5 3,210 Jan-25-2022, 09:45 PM
Last Post: Gribouillis
  How do I read in a Formula in Excel and convert it to do the computation in Python? JaneTan 2 3,708 Jul-07-2021, 02:06 PM
Last Post: Marbelous

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020