Python Forum
How to run existing python script parallel using multiprocessing - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Data Science (https://python-forum.io/forum-44.html)
+--- Thread: How to run existing python script parallel using multiprocessing (/thread-10486.html)



How to run existing python script parallel using multiprocessing - lravikumarvsp - May-23-2018

Hi

I had a python script which will search for directories containing the specific name and then search for the error files in those directories.
I want to run this script as a process, so that I can run 10 processes parallel at a time.
How can I acheive this ? with multi processing or multi threading?
Suggest me the best way and how to call this python script

import connect_to_hbase
import glob
import os
import csv
import numpy as np
connection= connect_to_hbase.conn
table = connection.table(connect_to_hbase.table_name_Source)
row_key =  '\x00\x00\x00\x01' 
res = list()
for row_key, data in table.scan(columns=['DETAILS:APP_ID']):
	result=data.values()
	for i in result:
		res.append(i)
		
x = np.array(res)
z=np.unique(x)
print(z)
patterns = str(z)
table = connect_to_hbase.conn.table(connect_to_hbase.table_name_Target)
base_path = '/ai2/data/dev/admin/inf/*{}*'
for pattern in patterns:
	
	search_path =  base_path.format(pattern)
	for f in glob.glob(search_path):
		print("-----------------------")
		print ("The directory path is:")
		print f
		print("List of files in the directory are:")
		os.chdir('/ai2/data/dev/admin/inf/')
    		os.chdir(f)
    		cwd = os.getcwd()	
		for subdir, dirs, files in os.walk(cwd, topdown=True):
				for file23 in glob.glob('*.err'):
					print file23
Output:
Connected to Hbase ['ACM' 'ACX' 'AW' 'BC' 'BLS' 'CA' 'CLP' 'CMU' 'CR' 'CSE' 'CTD' 'DHD' 'DMS' 'DRM' 'GSK' 'IPT' 'XU0'] ----------------------- The directory path is: /ai2/data/dev/admin/inf/inf_IPT_pvt List of files in the directory are: run_ingest_IPT_daily_1246.err ----------------------- The directory path is: /ai2/data/dev/admin/inf/inf_CLP_pvt List of files in the directory are: run_ingest_CLP_daily_1240.err ----------------------- The directory path is: /ai2/data/dev/admin/inf/inf_IPT_pvt List of files in the directory are: run_ingest_IPT_daily_1246.err ----------------------- The directory path is: /ai2/data/dev/admin/inf/inf_CTD_pvt List of files in the directory are: run_ingest_CTD_daily_1250.err ----------------------- The directory path is: /ai2/data/dev/admin/inf/inf_IPT_pvt List of files in the directory are: run_ingest_IPT_daily_1246.err ----------------------- The directory path is: /ai2/data/dev/admin/inf/inf_XU0_pvt List of files in the directory are: t_itm.err ----------------------- The directory path is: /ai2/data/dev/admin/inf/inf_ACX_pvt List of files in the directory are: accountshighfocus.err acm_access_log.err ----------------------- The directory path is: /ai2/data/dev/admin/inf/inf_XU0_pvt List of files in the directory are: t_itm.err ----------------------- The directory path is: /ai2/data/dev/admin/inf/inf_CMU_pvt List of files in the directory are: run_ingest_CMU_daily_1247.err -----------------------
Thanks


RE: How to run existing python script parallel using multiprocessing - ThiefOfTime - May-23-2018

Threads in python should only be used for input and output tasks. They are not like threads in other programming languages. When you have several threads started they would all wait until the current running thread pauses. Since you are not using shared variables and the only shared thing (the connection) is to read, I would recommend you multiprocesses. They are dangerous if you work with shared variables and write simultaniously, but then you will achieve different programms that run parallel and your computational time reduces


Threading in Python for each item in list - lravikumarvsp - May-24-2018

Hi,

I had a python script that searches for an item in a list.
Now my requirement is to start a separate thread for each item search in the for loop.
How can I achieve this?
Please help

import glob
import os
import csv
patterns = ['ACM','ACX','AW','BC']
for pattern in patterns:
	base_path = '/ai2/data/dev/admin/inf/*{}*'
	search_path =  base_path.format(pattern)
	for f in glob.glob(search_path):
		print("-----------------------")
		print ("The directory path is:")
		print f
		print("List of files in the directory are:")
		os.chdir('/ai2/data/dev/admin/inf/')
		os.chdir(f)
		cwd = os.getcwd()	
		for subdir, dirs, files in os.walk(cwd, topdown=True):
			for file23 in glob.glob('*.err'):
				print file23
In the above script for every "pattern" value, I need to start a separate thread

Output:
[root@edgenod]# python p123.py ----------------------- The directory path is: /ai2/data/dev/admin/inf/inf_ACM_pvt List of files in the directory are: dsplit.err ----------------------- The directory path is: /ai2/data/dev/admin/inf/inf_ACX_pvt List of files in the directory are: accountshighfocus.err acm_access_log.err ----------------------- The directory path is: /ai2/data/dev/admin/inf/inf_AW_pvt List of files in the directory are: aware.err ----------------------- The directory path is: /ai2/data/dev/admin/inf/inf_BC_pvt List of files in the directory are: run_ingest_BC_daily_1249.err



RE: How to run existing python script parallel using multiprocessing - lravikumarvsp - May-24-2018

Hi,

I had a python script that searches for an item in a list.
Now my requirement is to start a separate thread for each item search in the for loop.
How can I achieve this?
Please help
import glob
import os
import csv
patterns = ['ACM','ACX','AW','BC']
for pattern in patterns:
    base_path = '/ai2/data/dev/admin/inf/*{}*'
    search_path =  base_path.format(pattern)
    for f in glob.glob(search_path):
        print("-----------------------")
        print ("The directory path is:")
        print f
        print("List of files in the directory are:")
        os.chdir('/ai2/data/dev/admin/inf/')
        os.chdir(f)
        cwd = os.getcwd()   
        for subdir, dirs, files in os.walk(cwd, topdown=True):
            for file23 in glob.glob('*.err'):
                print file23
In the above script for every "pattern" value, I need to start a separate thread

Output:
[root@edgenod]# python p123.py ----------------------- The directory path is: /ai2/data/dev/admin/inf/inf_ACM_pvt List of files in the directory are: dsplit.err ----------------------- The directory path is: /ai2/data/dev/admin/inf/inf_ACX_pvt List of files in the directory are: accountshighfocus.err acm_access_log.err ----------------------- The directory path is: /ai2/data/dev/admin/inf/inf_AW_pvt List of files in the directory are: aware.err ----------------------- The directory path is: /ai2/data/dev/admin/inf/inf_BC_pvt List of files in the directory are: run_ingest_BC_daily_1249.err