Python Forum
multiprocessing advices
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
multiprocessing advices
#1
Hi all

For my projecy my current code does the job as expected so far, using massively Numpy and the vectorization whenever possible, using about 2 dozens of functions mainly to deal with data and to perform I/O works; the structure is purely sequential.

In some cases, and not necessarily for pure calculations, loops have been used (for I/O typically).

Now, I want the improve performances; I knew about Numba, but even if I’ve ever noticed huge gains on my computer (10x faster), pure calculations are not longest steps.

Multiprocessing is probably one on the solution I would have to dig into (a new field for me) ; even if loops call for functions, I guess I probably have to adapt the structure of my code for multiprocessing.

Of course I had a look on internet, but there’s a lot of docs, methods (process, map, startmap, sync or async, etc), and often tutos and snippets that do not work (goods and bads)

So does somebody can advice me in a good doc (ideally written for newbies with really few skills in parallelization topic) that explain pro and cons of each method, and how to implemented it (at the loop level)

Thanks

Paul
(providing code does not make sens at this stage, it's purely informative; it'll come at the next stage for sure Big Grin )
Reply
#2
The python.org documentation has examples: https://docs.python.org/3/library/multiprocessing.html
Reply
#3
Ok thanks for the link

There are many concepts I'm not familiar with, so it will be a hard topic for me. Think

Also I can notice that a lot of docs and tutos on the web do not mention the use of "if __name__ == '__main__':" ... even though so important. Dodgy

So now: paper, pencil, coffee .. and aspirin for incoming migrains Wall
Reply
#4
if __name__ == '__main__':
This has nothing to do with multiprocessing.
What this allows is for python to imported from another module without being run,
at the same time, if a module is run from the interpreter, __name__ will be set to '__main__'
thus allowing the if __name__ == '__main__': statement to be executed
Reply
#5
@Lazar60+

Just an observation on what I'm facing without using "if __name__ == '__main__' ":

import multiprocessing

def worker(num):
    print(f'Worker: {num}')
    return

# if __name__ == '__main__':
#     jobs = []
#     for i in range(1):
#         p = multiprocessing.Process(target=worker, args=(i,))
#         jobs.append(p)
#         p.start()

jobs = []
for i in range(5):
    p = multiprocessing.Process(target=worker, args=(i,))
    jobs.append(p)
    p.start()
Error:
RuntimeError: An attempt has been made to start a new process before the current process has finished its bootstrapping phase. This probably means that you are not using fork to start your child processes and you have forgotten to use the proper idiom in the main module: if __name__ == '__main__': freeze_support() ... The "freeze_support()" line can be omitted if the program is not going to be frozen to produce an executable.
An extract of the doc (link)
Quote:For an explanation of why the if __name__ == '__main__' part is necessary, see Programming guidelines
Reply
#6
Larz is a Linux guy and it sounds like you might be using Windows. On Linux, Python uses fork when creating processes while on Windows it uses spawn. When using fork, Python doesn't have to do anything to make the child process aware of everything the parent process knows. It happens automatically with the fork command. When using spawn, the new process knows nothing and Python has to import the current module to define things like functions and classes (other context information is pickled). So in Larz land you don't have to do anything to prevent running module main, but on windows you do.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Pycharm project desktop deployment - advices? majstorv 4 9,211 Oct-15-2016, 05:21 PM
Last Post: majstorv

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020