Python Forum
multiprocess hang when certain number is used in the program - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: General Coding Help (https://python-forum.io/forum-8.html)
+--- Thread: multiprocess hang when certain number is used in the program (/thread-30587.html)



multiprocess hang when certain number is used in the program - esphi - Oct-27-2020

Hi, I am a python beginner, I am starting to learn multiprocessing. I have created this program to simply calculate the squares of a big list of numbers.

The program takes a start_number and a end_number, separates the numbers into groups, then uses multiprocessing to calculate the results.

Each process puts the results in a List, and form a dictionary with their sequence number as key, then put into a Queue.

At the end, the program then combine the results as a list and print out the result list.



The program works fine with the start_number = 1, and end_number = any number between 1 to 12_998.

However, the program does not work when end_number = 12,999. It may work with certain end_number and not others passing 12_999. For example it works with 13_000 thru 13_005 but not 13_006 and not 13_007. It works with 13_008 thru 13_012 but not 13_013 ..... and so on.

I have tried running the same program in different computers with different CPU counts, and both windows and linux. The results are the same. I am using python 3.6.9

I have learned later that multiprocess.Pool is easier to use in this scenario. But I am interested to know what was wrong with my multiprocess.Process program, if anyone is kind enough to have a look.

Thank you.

-----------------
My codes:
-----------------
# This program hang when end number = 12_999 , 13_006, 13_007, 13_013.. etc
from multiprocessing import Process, Queue
from multiprocessing import log_to_stderr, get_logger


# divide the range of numbers into groups. so that each group can be processed with different Process. 
def dividegroup(minno, maxno, nof_groups):
    
    total_range = maxno - minno + 1
    remain = total_range % nof_groups
    group_range = total_range // nof_groups

    print(total_range, group_range, remain)

    lof_groups = []
    
    for i in range(nof_groups):
        lof_groups.append( range( minno + (i * group_range), minno + ((i + 1) * group_range )))

    if remain != 0 : lof_groups.append(range(minno + ((i + 1) * group_range), maxno + 1))

    return lof_groups


# square the numbers and put in Queque
def square(seq, numbers, q):

    answers = [x * x for x in numbers]    
    results = {seq: answers} 
    q.put(results, block = False)

# main program
def main():

    log_to_stderr()
    logger = get_logger()
    logger.setLevel(20)

    print('\033c')
    start_number = 1
    end_number = 12_999
    number_of_groups = 8

    list_of_groups = dividegroup(start_number, end_number, number_of_groups)
    print(list(list_of_groups))

    q = Queue(maxsize=0)
    process_seq = 0
    processes = []
 
    for i in list_of_groups:
        process_seq += 1
        process = Process(target = square, args = (process_seq, i, q))
        processes.append(process)

    for process_s in processes:
        process_s.start()

    for process_j in processes:
        process_j.join()

    result_dic = {}
    while not q.empty():
        result_dic.update(q.get())       

    result_list = []
    keylist = list(result_dic.keys())
    keylist.sort()
    for i in keylist:
        result_list += result_dic.get(i)
    print(result_list)


if __name__ == '__main__':
    main()
----------------------------------------------------
The Result when end_number = 12_999 is used.
----------------------------------------------------
Output:
12999 1624 7 [range(1, 1625), range(1625, 3249), range(3249, 4873), range(4873, 6497), range(6497, 8121), range(8121, 9745), range(9745, 11369), range(11369, 12993), range(12993, 13000)] [INFO/Process-3] child process calling self.run() [INFO/Process-1] child process calling self.run() [INFO/Process-3] process shutting down [INFO/Process-2] child process calling self.run() [INFO/Process-4] child process calling self.run() [INFO/Process-3] process exiting with exitcode 0 [INFO/Process-7] child process calling self.run() [INFO/Process-1] process shutting down [INFO/Process-1] process exiting with exitcode 0 [INFO/Process-8] child process calling self.run() [INFO/Process-9] child process calling self.run() [INFO/Process-6] child process calling self.run() [INFO/Process-4] process shutting down [INFO/Process-8] process shutting down [INFO/Process-4] process exiting with exitcode 0 [INFO/Process-5] child process calling self.run() [INFO/Process-9] process shutting down [INFO/Process-7] process shutting down [INFO/Process-2] process shutting down [INFO/Process-5] process shutting down [INFO/Process-6] process shutting down [INFO/Process-9] process exiting with exitcode 0 [INFO/Process-2] process exiting with exitcode 0 [INFO/Process-8] process exiting with exitcode 0 [INFO/Process-7] process exiting with exitcode 0 [INFO/Process-6] process exiting with exitcode 0



RE: multiprocess hang when certain number is used in the program - esphi - Oct-27-2020

I am new to the forum.


RE: multiprocess hang when certain number is used in the program - deanhystad - Oct-27-2020

I find it hangs for most ranges and seldom works.


RE: multiprocess hang when certain number is used in the program - esphi - Oct-27-2020

Hi Deanhystad,

1) I have a 8 cores CPU. I was trying to see how number of processes in relation to number of CPU, influence the speed.
2) For other numbers, it will print a list of results. For example end_number = 20, the output is as follows.
With end_number = 12_999, the program hangs, it will not print the result and "[MainProcess] process shutting down" was not performed.
With end_number = 13_000, the program works. But print out will be very long, therefore, I did not put an example here.

-------------------------------------
Example when end_number = 20
-------------------------------------
Output:
20 2 4 [range(1, 3), range(3, 5), range(5, 7), range(7, 9), range(9, 11), range(11, 13), range(13, 15), range(15, 17), range(17, 21)] [INFO/Process-1] child process calling self.run() [INFO/Process-1] process shutting down [INFO/Process-2] child process calling self.run() [INFO/Process-1] process exiting with exitcode 0 [INFO/Process-3] child process calling self.run() [INFO/Process-3] process shutting down [INFO/Process-3] process exiting with exitcode 0 [INFO/Process-2] process shutting down [INFO/Process-2] process exiting with exitcode 0 [INFO/Process-6] child process calling self.run() [INFO/Process-8] child process calling self.run() [INFO/Process-8] process shutting down [INFO/Process-5] child process calling self.run() [INFO/Process-7] child process calling self.run() [INFO/Process-8] process exiting with exitcode 0 [INFO/Process-5] process shutting down [INFO/Process-5] process exiting with exitcode 0 [INFO/Process-4] child process calling self.run() [INFO/Process-7] process shutting down [INFO/Process-4] process shutting down [INFO/Process-6] process shutting down [INFO/Process-7] process exiting with exitcode 0 [INFO/Process-6] process exiting with exitcode 0 [INFO/Process-4] process exiting with exitcode 0 [INFO/Process-9] child process calling self.run() [INFO/Process-9] process shutting down [INFO/Process-9] process exiting with exitcode 0 [1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 196, 225, 256, 289, 324, 361, 400] [INFO/MainProcess] process shutting down



RE: multiprocess hang when certain number is used in the program - esphi - Oct-27-2020

(Oct-27-2020, 10:05 AM)deanhystad Wrote: I find it hangs for most ranges and seldom works.

It works from 1 to 12_998, starts giving problem after 12_999.


RE: multiprocess hang when certain number is used in the program - deanhystad - Oct-27-2020

The problem appears to be with putting large results in the queue. I tried running with 1 process and it works until square.answers becomes large. I can do any number of square calculations as long as I only add a few numbers to the queue. This is probably why it seldom works for me. One of the first things I tried was reduce the number of processes which increased the size of each individual results.


RE: multiprocess hang when certain number is used in the program - esphi - Oct-27-2020

(Oct-27-2020, 10:54 AM)deanhystad Wrote: The problem appears to be with putting large results in the queue. I tried running with 1 process and it works until square.answers becomes large. I can do any number of square calculations as long as I only add a few numbers to the queue. This is probably why it seldom works for me. One of the first things I tried was reduce the number of processes which increased the size of each individual results.

1) I found varying the number of groups (processes) does alter the end_number that result in a hang.
when I reduce the number_of_groups = 2 (line 42), the program hang with end_number = 11675

2) I found similar problem does not develop if I change the operation from x * x to x + x. (line 28 of the program)

that do indicate that problem was not due to the number of processes.

btw: I have alter the program, so that it will loop the end_number from 1 to a specific number,
ie, I change the main() to a function and made a new main that will run the old main in loop.
so that I can run through numbers until it fails. That is how I found 11675.

if it is helpful, I can insert the new program here.


RE: multiprocess hang when certain number is used in the program - esphi - Nov-06-2020

Thanks deanhystad.
Does anyone else has any idea? I still have no clue what happened after scratching my head for few days.
Thanks in advance.