Feb-24-2019, 04:59 PM
Some more research turned up a few interesting tidbits. First, the a logic test is required on Windows machines when using multiprocessing. The documentation uses the test repeatedly but does not state that it's required.
Second, multiprocessing.cpu_count() returns the total number of CPUs which may be higher than the number of usable CPUs. According to the documentation, len(os.sched_getaffintity(0)) will return the number of usable CPUs. I researched this because the traceback highlights only one line of code in your script: mp.Pool(mp.cpu_count()).
The new function isn't quite what I had in mind; perhaps I didn't explain clearly. It's refactored to what I had in mind. I sectioned out your data using a list comprehension which *should* work properly, though it may need a little tweaking with the ranges.
In any case, give this script a try and cross your fingers:
Second, multiprocessing.cpu_count() returns the total number of CPUs which may be higher than the number of usable CPUs. According to the documentation, len(os.sched_getaffintity(0)) will return the number of usable CPUs. I researched this because the traceback highlights only one line of code in your script: mp.Pool(mp.cpu_count()).
The new function isn't quite what I had in mind; perhaps I didn't explain clearly. It's refactored to what I had in mind. I sectioned out your data using a list comprehension which *should* work properly, though it may need a little tweaking with the ranges.
In any case, give this script a try and cross your fingers:
import os from time import time import multiprocessing as mp import numpy as np # Prepare data np.random.RandomState(100) arr = np.random.randint(0, 10, size=[200000, 5])#print(arr) data = arr.tolist() data[:5] sectioned = [data[x:y] for x, y in zip(range(0, 3001, 1000),range(1000, 4001, 1000))] # Define and function creation def howmany_within_range(row, minimum, maximum): """Returns how many numbers lie within `maximum` and `minimum` in a given `row`""" count = 0 for n in row: if minimum <= n <= maximum: count = count + 1 return count # Define funcThread def funcThread(data_section, minimum, maximum): for row in data_section: howmany_within_range(row, minimum, maximum) if __name__ == "__main__": # Required logic expression # Block: Paralleization # Step 1: Init multiprocessing.Pool() n = len(os.sched_getaffinity(0)) pool = mp.Pool(n) # Step 2: `pool.apply` the `howmany_within_range()` results = [pool.apply(funcThread, args=(sec, 4, 8)) for sec in sectioned] # Step 3: Don't forget to close pool.close() pool.join() # Step 4: Result print(results[:10])