Hi,
how can I get this multi processing example working in the easiest way?
I 've got 4 processes, each process lasts 2 seconds.
How can I get the for processes get finished in less than 8 seconds?
(They shall be processed parallel.)
Thanks a lot for your help...
from multiprocessing import Process
import time
def cpu_extensive():
time.sleep(2)
print('Done')
def main():
# define processes
p1 = Process(target=cpu_extensive())
p1.start()
p2 = Process(target=cpu_extensive())
p2.start()
p3 = Process(target=cpu_extensive())
p3.start()
p4 = Process(target=cpu_extensive())
p4.start()
p1.join()
p2.join()
p3.join()
p4.join()
if __name__ == '__main__':
start_measuring = time.time()
main()
end_measuring = time.time()
t = end_measuring - start_measuring
print(t)
This executes the function.
cpu_extensive()
So you are executing "cpu_extensive()", then creating the process and setting target=None.
Instead use:
p1 = Process(target=cpu_extensive)
For something like this you might want to look at Pool.
from multiprocessing import Pool
import time
def cpu_extensive(i):
time.sleep(2)
print(i, 'Done')
if __name__ == "__main__":
starttime = time.time()
pool = Pool()
pool.map(cpu_extensive, range(4))
pool.close()
endtime = time.time()
print(f"Time taken {endtime-starttime} seconds")
Hi deanhystad,
thanks a lot for your great answer!!
Example 1:
from multiprocessing import Process
import time
def cpu_extensive():
time.sleep(2)
print('Done')
def main():
# define processes
for i in range(5):
# process_name = p1, p2, p3...
process_name = "{}{}".format("p", i)
process_name = Process(target=cpu_extensive)
process_name.start()
for i in range(5):
process_name.join()
if __name__ == '__main__':
start_measuring = time.time()
main()
end_measuring = time.time()
t = end_measuring - start_measuring
print(t)
Example 2:
from multiprocessing import Pool
import time
def cpu_extensive(i):
time.sleep(2)
print(i, 'Done')
if __name__ == "__main__":
starttime = time.time()
pool = Pool()
pool.map(cpu_extensive, range(4))
pool.close()
endtime = time.time()
print(f"Time taken {endtime - starttime} seconds")
I simulate a load which takes 2 seconds, 4 times... This is processed parallel.
Example 1 and 2 both lasts approximately 2,18 seconds.
I'm planning to deal 40000 pictures with the pHash and want to use multiprocessing to speed up the processing.
Perhaps you have an idea, if Example 1 or 2 is more suitable for this load...
Thank you very much for your help!!
flash77
At most multi-processing is only going to increase you speed x4 or so. Probably less. You might want to look at multiple tasks depending on if the process is I/O or processor intensive.
The slowest is IO. To get benefits from multiprocessing, IO must be fast enough.
I do not have significant differences (15s vs 12s) between 1 Process and 4 Processes.
He still needs to read the Data from SSD.
I'm not confident, if a mmap is the fastest possible solution to hash a file.
from hashlib import md5
from pathlib import Path
from multiprocessing import Pool
from mmap import ACCESS_READ, mmap
EMPTY = md5().hexdigest()
def hasher(file: Path) -> str:
if file.stat().st_size == 0:
return EMPTY
with file.open("rb") as fd:
with mmap(fd.fileno(), 0, access=ACCESS_READ) as mm:
print(md5(mm).hexdigest(), file, sep=" ")
def main(glob):
files = [element for element in Path().rglob(glob) if element.is_file()]
with Pool(4) as pool:
pool.map(hasher, files)
if __name__ == "__main__":
main("*.pdf")