Python Forum

Hi,

I am kind of new to Python as well as programming. I am running partitional clustering for time series data. It has 178 columns by 50000 rows. The below code is working fine as I tested for few rows. I would like to run for the entire rows (I waited for 2 hours but didn't finish). Hence wanted to try parallel processing.

This is my normal code:

k_means = TimeSeriesKMeans(
    n_clusters=8,  # Number of desired centers
    init_algorithm="forgy",  # Center initialisation technique
    max_iter=10,  # Maximum number of iterations for refinement on training set
    metric="dtw",  # Distance metric to use
    averaging_method="mean",  # Averaging technique to use
    random_state=1
)

###k_means.fit(X_df_melt3)

identified_clusters = k_means.fit_predict(X_df_melt3)

############################################################################
Below is something I tried based on different forums

from multiprocessing import Process
import os
import time
 

def sktimekmeans():
    kmeanss= TimeSeriesKMeans(
    n_clusters=8,  # Number of desired centers
    init_algorithm="forgy",  # Center initialisation technique
    max_iter=10,  # Maximum number of iterations for refinement on training set
    metric="dtw",  # Distance metric to use
    averaging_method="mean",  # Averaging technique to use
    random_state=1).fit(X_df_melt3)
    return kmeanss

if __name__ == '__main__':
    start_time = time.perf_counter()
 
    # Creates two processes
    p1 = multiprocessing.Process(target=sktimekmeans)
    # Starts both processes
    p1.start()
    print (p1)

First of all I am not sure, the code is correct. When I run this code, it throws an error :

Error:traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\sound\anaconda3\lib\multiprocessing\spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "C:\Users\sound\anaconda3\lib\multiprocessing\spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
AttributeError: Can't get attribute 'sktimekmeans' on <module '__main__' (built-in)>

I tried to understand the error from different forums, but couldnt because of limited knowledge in python. I appreciate your help on this issue.

from multiprocessing import Process

p1 = multiprocessing.Process(target=sktimekmeans)

Not sure what this does, but you import Process and try to start multiprocessing.Process. Also. you only start one Process, not 2.

    return kmeanss

You don't catch thr return. You may want a Manager object, a Queue, or logging depending on what type kmeanss is, and what you do with the return.

Mohana1983

woooee