Using multiprocessing to produce objects for i in range

lucasrohr · (This post was last modified: Feb-02-2022, 11:15 AM by lucasrohr.)

So I wrote this massive code filling a hotel with fictional objects (i.e. people) with a bunch of attributes with random values, then making a statistical analysis of those attributes. Now I'm trying to make it faster.
I've tried it with multithreading, but when I produced 10 000 000 guests, it only made my code 3 seconds faster (45 instead of 48 sec).

So now I'm wondering how I would make this more efficient using multiprocessing. Unfortunately, whatever I tried doesn't work, the processes don't actually do anything. I always get 0 guests.

This is the code with multithreading. I know it's convoluted and there are much more efficient ways to do this, but the point is doing it with objects.
Of course, any other suggestions to make this faster while still using objects would be appreciated as well.

Thanks in advance!

import random
import threading

guestsUnderage = 0
guestsUnderageMale = 0
guestsUnderageFemale = 0
guestsAdults = 0
guestsTotal = 0

guestsEurope = 0
guestsAsia = 0
guestsAmericas = 0
guestsAfrica = 0
guestsAustralia = 0
numberPredominantOrigins = 0

number_of_guests = 0
guestsPerThread = 0
guestsFraction = 0

hotel1= []
hotel2 = []
hotel = []

origins = ['Europe', 'Africa', 'The Americas', 'Asia', 'Australia'] #List of possible origins. I have not yet used this attribute for anything in the statistics.
ages = range(10,70) #List of possible ages. This is a list of integers!!! Hence, it might need to be changed into a string depending on the situation it is to be used in. (see line 25)
genders = ['male', 'female']


class Tenant: #Definies what attributes the tenant (i.e. object) will have
    roomNumberCount = 1

    __slots__ = ['roomNumber', 'origin', 'age', 'gender']
    def __init__(self, roomNumber, origin, age, gender):
        self.roomNumber = roomNumber
        self.origin = origin #String with origin name
        self.age = age #This will be a string! Hence it might need to be changed into an integer depending on the situation. (see line 31)
        self.gender = gender
        Tenant.roomNumberCount += 1
 
    def __str__(self): #Structure of the output when calling upon a tenant (i.e. object)
        return f'\nRoom Number: {str(self.roomNumber)} \nOrigin: {self.origin} \nAge: {str(self.age)} \nGender: {self.gender}'
 

def produceTenants1():
    global hotel1
    for i in range(guestsPerThread): #Random number of rooms between 20 and 40
        tenanti = Tenant(str(Tenant.roomNumberCount),origins[random.randint(0, len(origins) - 1)], str(ages[random.randint(0, len(ages) - 1)]), genders[random.randint(0, len(genders)-1)]) #Tells python the object class (i.e. Tenant) and where to get the attribute information from. In this case each attribute's value was chosen at random from the selection above. The first list point in the list is 0!!!, so we have to subtract 1 from the total length of each list, otherwise it might chose an option outside the range which would give us an error.
        hotel1.append(tenanti) #Puts the tenants into the hotel

def produceTenants2():
    global hotel2
    for i in range(guestsFraction): #Random number of rooms between 20 and 40
        tenanti = Tenant(str(Tenant.roomNumberCount),origins[random.randint(0, len(origins) - 1)], str(ages[random.randint(0, len(ages) - 1)]), genders[random.randint(0, len(genders)-1)]) #Tells python the object class (i.e. Tenant) and where to get the attribute information from. In this case each attribute's value was chosen at random from the selection above. The first list point in the list is 0!!!, so we have to subtract 1 from the total length of each list, otherwise it might chose an option outside the range which would give us an error.
        hotel2.append(tenanti) #Puts the tenants into the hotel

def main():
    global number_of_guests, guestsPerThread, guestsFraction

    try: # Number of guests determined by user input
        number_of_guests = int(input("How many people have checked in?"))
    except Exception as exception:  # If number is not integer 
        number_of_guests = 100
        if (type(exception).__name__) == 'ValueError':
            print('Entered value has to be integer! Setting number to default value (>>100<< guests).')
        else:
            print(f'{type(exception).__name__}! Setting number to default value (>>100<< guests).')
    
    guestsPerThread = number_of_guests // 4
    guestsFraction = number_of_guests % 4

    t1 = threading.Thread(target = produceTenants1)
    t2 = threading.Thread(target = produceTenants1)
    t3 = threading.Thread(target = produceTenants1)
    t4 = threading.Thread(target = produceTenants1)

    t1.start()
    t2.start()
    t3.start()
    t4.start()

    t1.join()
    t2.join()
    t3.join()
    t4.join()

    produceTenants2() #Producing the remaining fraction

    hotel = hotel1 + hotel2

    global guestsTotal, guestsUnderage, guestsUnderageFemale, guestsUnderageMale, guestsAdults, guestsEurope, guestsAfrica, guestsAsia, guestsAmericas, guestsAustralia, numberPredominantOrigins
    
    for guest in hotel: #guest is used to differentiate from Tenant and tenanti. Could have been used interchangably, though.
        guestsTotal += 1
        origin = guest.origin
        if int(guest.age) < 18:
            guestsUnderage += 1
    
            if guest.gender == 'male':
                guestsUnderageMale += 1
            else:
                guestsUnderageFemale += 1
        else:
            guestsAdults += 1

        match origin:
            case 'Europe':
                guestsEurope += 1
            case 'Africa':
                guestsAfrica += 1
            case 'Asia':
                guestsAsia += 1
            case 'The Americas':
                guestsAmericas += 1
            case 'Australia':
                guestsAustralia += 1


    for guest in hotel:
        if int(guest.roomNumber) <= 5: # This is just to verify that the room numbers are successive
            print(guest)
        else:
            break

    originDictionary = {'Europe': guestsEurope, 'Africa': guestsAfrica, 'Asia': guestsAsia, 'The Americas': guestsAmericas, 'Australia': guestsAustralia}


    guestsFromPredominantOrigin = originDictionary.get(max(originDictionary, key=originDictionary.get)) # String associated with the largest value in the dictionary. Still haven't figured out what to do if there are two equal values at the top.
    predominantOrigin = []
    for k, v in originDictionary.items():
        if v == guestsFromPredominantOrigin:
            predominantOrigin.append(k)
            numberPredominantOrigins += 1 # I will change the list below so I need this separate counter at the end

    delimiter = ' or '
    if len(predominantOrigin) == 1:
        predominantOrigin = delimiter.join(predominantOrigin)
    else:
        predominantOrigin = delimiter.join([", ".join(predominantOrigin[:-1]),predominantOrigin[-1]])


    print(f'{guestsTotal} guests have checked into the hotel. {guestsUnderage} were underage and among those, {guestsUnderageFemale} were female and {guestsUnderageMale} were male. Most people came from {predominantOrigin}, numbering {guestsFromPredominantOrigin}' + ('.' if numberPredominantOrigins == 1 else ' each.'))

if __name__ == "__main__":

    main()

**deanhystad** · (This post was last modified: Feb-02-2022, 01:36 PM by deanhystad.)

This should work on linux, but it will not work on windows. In linux a process is forked, starting out with the same process image as the parent. In windows a process is spawned, starting out as a completely new process.

To have a variable that is the same for all processes, you need to pass it as an argument to the process.

From the docs: https://docs.python.org/3/library/multiprocessing.html

from multiprocessing import Process, Value, Array

#Data can be stored in a shared memory map using Value or Array. For example, the following code

def f(n, a):
    n.value = 3.1415927
    for i in range(len(a)):
        a[i] = -a[i]

if __name__ == '__main__':
    num = Value('d', 0.0)
    arr = Array('i', range(10))

    p = Process(target=f, args=(num, arr))
    p.start()
    p.join()

    print(num.value)
    print(arr[:])

Or you can use a process manager. From the same document.

from multiprocessing import Process, Manager

def f(d, l):
    d[1] = '1'
    d['2'] = 2
    d[0.25] = None
    l.reverse()

if __name__ == '__main__':
    with Manager() as manager:
        d = manager.dict()
        l = manager.list(range(10))

        p = Process(target=f, args=(d, l))
        p.start()
        p.join()

        print(d)
        print(l)

lucasrohr · (This post was last modified: Feb-02-2022, 01:13 PM by lucasrohr.)

(Feb-02-2022, 01:10 PM)deanhystad Wrote: This should work on linux, but it will not work on windows. In linux a process is forked, starting out with the same process image as the parent. In windows a process is spawned, starting out as a completely new process.

To have global variables in windows (and you should do this for linux too) you need to use the multiprocessor manager to create mutable objects that are shared among processes. When you spawn a process you pass the shared object as an argument.
import multiprocessing
...
manager = multiprocessing.Manager()
hotel = manager.list()
...
process = multiprocessing.Process(target=whatever, args=[hotel])
...

Thank you!

...wait, there are differences between linux and windows?

Well, I'll try it on my linux laptop. Thanks!

Would there be any way to do this on windows?

lucasrohr · (This post was last modified: Feb-02-2022, 01:25 PM by lucasrohr.)

EDIT: sorry, didn't see your full reply.

**deanhystad** · (This post was last modified: Feb-02-2022, 01:41 PM by deanhystad.)

The difference between windows and unix is windows spawns a fresh process with all new global variables. This means each process has it's own global hotel[] and nobody adds items to the hotel[] in the parent process. To get around this you do not use global variables, but instead use local variables which are passed as arguments to the child processes. You can still use global variables in multiprocessing as long as they are not used to pass information back from the child processes. In your code origins[], ages[] and genders[] can be global variables. Every process will have their own variables, but they will all be the same.

In unix a process is forked. It starts out as an exact copy of the parent. This means the child processes have the same global variables as the parent processes.

lucasrohr · (This post was last modified: Feb-02-2022, 02:39 PM by lucasrohr.)

(Feb-02-2022, 01:41 PM)deanhystad Wrote: The difference between windows and unix is windows spawns a fresh process with all new global variables. This means each process has it's own global hotel[] and nobody adds items to the hotel[] in the parent process. To get around this you do not use global variables, but instead use local variables which are passed as arguments to the child processes. You can still use global variables in multiprocessing as long as they are not used to pass information back from the child processes. In your code origins[], ages[] and genders[] can be global variables. Every process will have their own variables, but they will all be the same.

In unix a process is forked. It starts out as an exact copy of the parent. This means the child processes have the same global variables as the parent processes.

Ok got it, more or less. I'm getting there. Excrutiatingly slowly, but I'm getting there.^^

So in the context of this code: can you edit the original so I can reverse engineer it, or is that too much trouble?

Also, does the manager make the code slower? And if it does, is there any point to multiprocessing in this case?

Or is there any other way to make a process return the list for the main process to pick up in some way? (In which case I would have to make separate functions for each process, I presume)

lucasrohr · (This post was last modified: Feb-02-2022, 03:53 PM by lucasrohr.)

(Feb-02-2022, 01:35 PM)deanhystad Wrote: The difference between windows and unix is windows spawns a fresh process with all new global variables. This means each process has it's own global hotel[] and nobody adds items to the hotel[] in the parent process. To get around this you do not use global variables, but instead use local variables which are passed as arguments to the child processes.

In unix a process is forked. It starts out as an exact copy of the parent. This means the child processes have the same global variables as the parent processes.

Ok, so I have to be honest. I don't understand python well enough to work with the manager yet, so I tried to use Queue, which actually worked!
Thing is. It only works as long as the number of objects doesn't exceed 584.
Any idea what the reason for this could be? It just stops doing anything when I enter something higher than 584, which is an oddly specific number.

EDIT: Actually, I can reach even higher numbers, but it doesn't always succeed. The runs seem to be more likely to succeed when they are a multiple of 4.

import random
import multiprocessing as mp

guestsUnderage = 0
guestsUnderageMale = 0
guestsUnderageFemale = 0
guestsAdults = 0
guestsTotal = 0

guestsEurope = 0
guestsAsia = 0
guestsAmericas = 0
guestsAfrica = 0
guestsAustralia = 0
numberPredominantOrigins = 0

number_of_guests = 0
guestsPerThread = 0
guestsFraction = 0

hotel = []
hotel1 = []

origins = ['Europe', 'Africa', 'The Americas', 'Asia', 'Australia'] #List of possible origins. I have not yet used this attribute for anything in the statistics.
ages = range(10,70) #List of possible ages. This is a list of integers!!! Hence, it might need to be changed into a string depending on the situation it is to be used in. (see line 25)
genders = ['male', 'female']


class Tenant: #Definies what attributes the tenant (i.e. object) will have
    roomNumberCount = 1

    __slots__ = ['roomNumber', 'origin', 'age', 'gender']
    def __init__(self, roomNumber, origin, age, gender):
        self.roomNumber = roomNumber
        self.origin = origin #String with origin name
        self.age = age #This will be a string! Hence it might need to be changed into an integer depending on the situation. (see line 31)
        self.gender = gender
        Tenant.roomNumberCount += 1
 
    def __str__(self): #Structure of the output when calling upon a tenant (i.e. object)
        return f'\nRoom Number: {str(self.roomNumber)} \nOrigin: {self.origin} \nAge: {str(self.age)} \nGender: {self.gender}'
 

def produceTenants1(q,x):
    hotel = []
    for i in range(x): #Random number of rooms between 20 and 40
        tenanti = Tenant(str(Tenant.roomNumberCount),origins[random.randint(0, len(origins) - 1)], str(ages[random.randint(0, len(ages) - 1)]), genders[random.randint(0, len(genders)-1)]) #Tells python the object class (i.e. Tenant) and where to get the attribute information from. In this case each attribute's value was chosen at random from the selection above. The first list point in the list is 0!!!, so we have to subtract 1 from the total length of each list, otherwise it might chose an option outside the range which would give us an error.
        hotel.append(tenanti) #Puts the tenants into the hotel
    q.put(hotel)

        

def produceTenants2():
    for i in range(guestsFraction): #Random number of rooms between 20 and 40
        tenanti = Tenant(str(Tenant.roomNumberCount),origins[random.randint(0, len(origins) - 1)], str(ages[random.randint(0, len(ages) - 1)]), genders[random.randint(0, len(genders)-1)]) #Tells python the object class (i.e. Tenant) and where to get the attribute information from. In this case each attribute's value was chosen at random from the selection above. The first list point in the list is 0!!!, so we have to subtract 1 from the total length of each list, otherwise it might chose an option outside the range which would give us an error.
        hotel1.append(tenanti) #Puts the tenants into the hotel

def main():
    global number_of_guests, guestsPerThread, guestsFraction

    try: # Number of guests determined by user input
        number_of_guests = int(input("How many people have checked in?"))
    except Exception as exception:  # If number is not integer 
        number_of_guests = 100
        if (type(exception).__name__) == 'ValueError':
            print('Entered value has to be integer! Setting number to default value (>>100<< guests).')
        else:
            print(f'{type(exception).__name__}! Setting number to default value (>>100<< guests).')
    
    guestsPerThread = number_of_guests // 4
    guestsFraction = number_of_guests % 4

    q = mp.Queue()
    p1 = mp.Process(target = produceTenants1, args = (q,guestsPerThread))
    p2 = mp.Process(target = produceTenants1, args = (q,guestsPerThread))
    p3 = mp.Process(target = produceTenants1, args = (q,guestsPerThread))
    p4 = mp.Process(target = produceTenants1, args = (q,guestsPerThread))
    
    p1.start()
    p2.start()
    p3.start()
    p4.start()

    p1.join()
    hotel = q.get()
    p2.join()
    hotel = hotel + q.get()
    p3.join()
    hotel = hotel + q.get()
    p4.join()
    hotel = hotel + q.get()



    produceTenants2()

    hotel = hotel + hotel1
    global guestsTotal, guestsUnderage, guestsUnderageFemale, guestsUnderageMale, guestsAdults, guestsEurope, guestsAfrica, guestsAsia, guestsAmericas, guestsAustralia, numberPredominantOrigins
    
    for guest in hotel: #guest is used to differentiate from Tenant and tenanti. Could have been used interchangably, though.
        guestsTotal += 1
        origin = guest.origin
        if int(guest.age) < 18:
            guestsUnderage += 1
    
            if guest.gender == 'male':
                guestsUnderageMale += 1
            else:
                guestsUnderageFemale += 1
        else:
            guestsAdults += 1

        match origin:
            case 'Europe':
                guestsEurope += 1
            case 'Africa':
                guestsAfrica += 1
            case 'Asia':
                guestsAsia += 1
            case 'The Americas':
                guestsAmericas += 1
            case 'Australia':
                guestsAustralia += 1


    for guest in hotel:
        if int(guest.roomNumber) <= 5: # This is just to verify that the room numbers are successive
            print(guest)
        else:
            break

    originDictionary = {'Europe': guestsEurope, 'Africa': guestsAfrica, 'Asia': guestsAsia, 'The Americas': guestsAmericas, 'Australia': guestsAustralia}


    guestsFromPredominantOrigin = originDictionary.get(max(originDictionary, key=originDictionary.get)) # String associated with the largest value in the dictionary. Still haven't figured out what to do if there are two equal values at the top.
    predominantOrigin = []
    for k, v in originDictionary.items():
        if v == guestsFromPredominantOrigin:
            predominantOrigin.append(k)
            numberPredominantOrigins += 1 # I will change the list below so I need this separate counter at the end

    delimiter = ' or '
    if len(predominantOrigin) == 1:
        predominantOrigin = delimiter.join(predominantOrigin)
    else:
        predominantOrigin = delimiter.join([", ".join(predominantOrigin[:-1]),predominantOrigin[-1]])


    print(f'{guestsTotal} guests have checked into the hotel. {guestsUnderage} were underage and among those, {guestsUnderageFemale} were female and {guestsUnderageMale} were male. Most people came from {predominantOrigin}, numbering {guestsFromPredominantOrigin}' + ('.' if numberPredominantOrigins == 1 else ' each.'))

if __name__ == "__main__":

    main()

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Produce One file Per PurchaseOrder	jland47	1	464	Jan-26-2024, 11:38 AM Last Post: Larz60+
	matplotlib x axis range goes over the set range	Pedroski55	5	3,443	Nov-21-2021, 08:40 AM Last Post: paul18fr
	multiprocessing with objects? - help	m3atwad	0	1,331	Nov-17-2020, 03:16 AM Last Post: m3atwad
	Define a range, return all numbers of range that are NOT in csv data	KiNeMs	18	7,496	Jan-24-2020, 06:19 AM Last Post: KiNeMs
	Can the comments produce errors in python?	newbieAuggie2019	9	4,688	Nov-26-2019, 12:19 AM Last Post: micseydel
	Code works in IDLE, appears to work in CMD, but won't produce files in CMD/Windows	ChrisPy33	3	3,361	Jun-12-2019, 05:56 AM Last Post: ChrisPy33
	\t produce eight gap but tab only produce four gap	liuzhiheng	3	2,553	Jun-09-2019, 07:05 PM Last Post: Gribouillis
	Python Script to Produce Difference Between Files and Resolve DNS Query for the Outpu	sultan	2	2,655	May-22-2019, 07:20 AM Last Post: buran
	Convert file sizes: will this produce accurate results?	RickyWilson	2	8,301	Dec-04-2017, 03:36 PM Last Post: snippsat
	How can I produce a list of n times the same element?	JoeB	6	3,917	Nov-27-2017, 10:40 PM Last Post: wavic

Using multiprocessing to produce objects for i in range

User Panel Messages

Announcements