Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 How to run existing python script parallel using multiprocessing
#1
Hi

I had a python script which will search for directories containing the specific name and then search for the error files in those directories.
I want to run this script as a process, so that I can run 10 processes parallel at a time.
How can I acheive this ? with multi processing or multi threading?
Suggest me the best way and how to call this python script

import connect_to_hbase
import glob
import os
import csv
import numpy as np
connection= connect_to_hbase.conn
table = connection.table(connect_to_hbase.table_name_Source)
row_key =  '\x00\x00\x00\x01' 
res = list()
for row_key, data in table.scan(columns=['DETAILS:APP_ID']):
	result=data.values()
	for i in result:
		res.append(i)
		
x = np.array(res)
z=np.unique(x)
print(z)
patterns = str(z)
table = connect_to_hbase.conn.table(connect_to_hbase.table_name_Target)
base_path = '/ai2/data/dev/admin/inf/*{}*'
for pattern in patterns:
	
	search_path =  base_path.format(pattern)
	for f in glob.glob(search_path):
		print("-----------------------")
		print ("The directory path is:")
		print f
		print("List of files in the directory are:")
		os.chdir('/ai2/data/dev/admin/inf/')
    		os.chdir(f)
    		cwd = os.getcwd()	
		for subdir, dirs, files in os.walk(cwd, topdown=True):
				for file23 in glob.glob('*.err'):
					print file23

Output:
Connected to Hbase ['ACM' 'ACX' 'AW' 'BC' 'BLS' 'CA' 'CLP' 'CMU' 'CR' 'CSE' 'CTD' 'DHD' 'DMS' 'DRM' 'GSK' 'IPT' 'XU0'] ----------------------- The directory path is: /ai2/data/dev/admin/inf/inf_IPT_pvt List of files in the directory are: run_ingest_IPT_daily_1246.err ----------------------- The directory path is: /ai2/data/dev/admin/inf/inf_CLP_pvt List of files in the directory are: run_ingest_CLP_daily_1240.err ----------------------- The directory path is: /ai2/data/dev/admin/inf/inf_IPT_pvt List of files in the directory are: run_ingest_IPT_daily_1246.err ----------------------- The directory path is: /ai2/data/dev/admin/inf/inf_CTD_pvt List of files in the directory are: run_ingest_CTD_daily_1250.err ----------------------- The directory path is: /ai2/data/dev/admin/inf/inf_IPT_pvt List of files in the directory are: run_ingest_IPT_daily_1246.err ----------------------- The directory path is: /ai2/data/dev/admin/inf/inf_XU0_pvt List of files in the directory are: t_itm.err ----------------------- The directory path is: /ai2/data/dev/admin/inf/inf_ACX_pvt List of files in the directory are: accountshighfocus.err acm_access_log.err ----------------------- The directory path is: /ai2/data/dev/admin/inf/inf_XU0_pvt List of files in the directory are: t_itm.err ----------------------- The directory path is: /ai2/data/dev/admin/inf/inf_CMU_pvt List of files in the directory are: run_ingest_CMU_daily_1247.err -----------------------
Thanks
Quote
#2
Threads in python should only be used for input and output tasks. They are not like threads in other programming languages. When you have several threads started they would all wait until the current running thread pauses. Since you are not using shared variables and the only shared thing (the connection) is to read, I would recommend you multiprocesses. They are dangerous if you work with shared variables and write simultaniously, but then you will achieve different programms that run parallel and your computational time reduces
Quote
#3
Hi,

I had a python script that searches for an item in a list.
Now my requirement is to start a separate thread for each item search in the for loop.
How can I achieve this?
Please help

import glob
import os
import csv
patterns = ['ACM','ACX','AW','BC']
for pattern in patterns:
	base_path = '/ai2/data/dev/admin/inf/*{}*'
	search_path =  base_path.format(pattern)
	for f in glob.glob(search_path):
		print("-----------------------")
		print ("The directory path is:")
		print f
		print("List of files in the directory are:")
		os.chdir('/ai2/data/dev/admin/inf/')
		os.chdir(f)
		cwd = os.getcwd()	
		for subdir, dirs, files in os.walk(cwd, topdown=True):
			for file23 in glob.glob('*.err'):
				print file23
In the above script for every "pattern" value, I need to start a separate thread

Output:
[root@edgenod]# python p123.py ----------------------- The directory path is: /ai2/data/dev/admin/inf/inf_ACM_pvt List of files in the directory are: dsplit.err ----------------------- The directory path is: /ai2/data/dev/admin/inf/inf_ACX_pvt List of files in the directory are: accountshighfocus.err acm_access_log.err ----------------------- The directory path is: /ai2/data/dev/admin/inf/inf_AW_pvt List of files in the directory are: aware.err ----------------------- The directory path is: /ai2/data/dev/admin/inf/inf_BC_pvt List of files in the directory are: run_ingest_BC_daily_1249.err
buran wrote May-24-2018, 03:57 AM:
Please, don't start new thread. Keep the discussion in the original thread.
Quote
#4
Hi,

I had a python script that searches for an item in a list.
Now my requirement is to start a separate thread for each item search in the for loop.
How can I achieve this?
Please help
import glob
import os
import csv
patterns = ['ACM','ACX','AW','BC']
for pattern in patterns:
    base_path = '/ai2/data/dev/admin/inf/*{}*'
    search_path =  base_path.format(pattern)
    for f in glob.glob(search_path):
        print("-----------------------")
        print ("The directory path is:")
        print f
        print("List of files in the directory are:")
        os.chdir('/ai2/data/dev/admin/inf/')
        os.chdir(f)
        cwd = os.getcwd()   
        for subdir, dirs, files in os.walk(cwd, topdown=True):
            for file23 in glob.glob('*.err'):
                print file23
In the above script for every "pattern" value, I need to start a separate thread

Output:
[root@edgenod]# python p123.py ----------------------- The directory path is: /ai2/data/dev/admin/inf/inf_ACM_pvt List of files in the directory are: dsplit.err ----------------------- The directory path is: /ai2/data/dev/admin/inf/inf_ACX_pvt List of files in the directory are: accountshighfocus.err acm_access_log.err ----------------------- The directory path is: /ai2/data/dev/admin/inf/inf_AW_pvt List of files in the directory are: aware.err ----------------------- The directory path is: /ai2/data/dev/admin/inf/inf_BC_pvt List of files in the directory are: run_ingest_BC_daily_1249.err
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Python Parallel Programing wissam1974 6 2,493 Feb-25-2019, 08:48 PM
Last Post: wissam1974
  Updating the Pandas dataframe to existing excel workbook in existing worksheet. sanmaya 1 4,303 Jul-01-2018, 06:23 PM
Last Post: volcano63

Forum Jump:


Users browsing this thread: 1 Guest(s)