Collect logs for a function in python using shell script

viru · (This post was last modified: Aug-28-2017, 06:20 PM by viru.)

I have pyspark script that is working fine. This script will fetch data from mysql and create hive tables in HDFS.

The pyspark script is below.

    #!/usr/bin/env python
    import sys
    from pyspark import SparkContext, SparkConf
    from pyspark.sql import HiveContext
    conf = SparkConf()
    sc = SparkContext(conf=conf)
    sqlContext = HiveContext(sc)

    #Condition to specify exact number of arguments in the spark-submit command line
    if len(sys.argv) != 8:
        print "Invalid number of args......"
        print "Usage: spark-submit import.py Arguments"
        exit()
    table = sys.argv[1]
    hivedb = sys.argv[2]
    domain = sys.argv[3]
    port=sys.argv[4]
    mysqldb=sys.argv[5]
    username=sys.argv[6]
    password=sys.argv[7]

    df = sqlContext.read.format("jdbc").option("url", "{}:{}/{}".format(domain,port,mysqldb)).option("driver", "com.mysql.jdbc.Driver").option("dbtable","{}".format(table)).option("user", "{}".format(username)).option("password", "{}".format(password)).load()

    #Register dataframe as table
    df.registerTempTable("mytempTable")

    # create hive table from temp table:
    sqlContext.sql("create table {}.{} as select * from mytempTable".format(hivedb,table))

    sc.stop()

Now this pyspark script will be invoked by using a shell script. For this shell script I am passing table names as arguments from a file.

The shell script is below.

#!/bin/bash

source /home/$USER/spark/source.sh
[ $# -ne 1 ] && { echo "Usage : $0 table ";exit 1; }

args_file=$1

TIMESTAMP=date "+%Y-%m-%d"
touch /home/$USER/logs/${TIMESTAMP}.success_log
touch /home/$USER/logs/${TIMESTAMP}.fail_log
success_logs=/home/$USER/logs/${TIMESTAMP}.success_log
failed_logs=/home/$USER/logs/${TIMESTAMP}.fail_log

#Function to get the status of the job creation
function log_status
{
status=$1
message=$2
if [ "$status" -ne 0 ]; then
echo "date +\"%Y-%m-%d %H:%M:%S\" [ERROR] $message [Status] $status : failed" | tee -a "${failed_logs}"
#echo "Please find the attached log file for more details"
exit 1
else
echo "date +\"%Y-%m-%d %H:%M:%S\" [INFO] $message [Status] $status : success" | tee -a "${success_logs}"
fi
}
while read -r table ;do
spark-submit --name "${table}" --master "yarn-client" --num-executors 2 --executor-memory 6g --executor-cores 1 --conf "spark.yarn.executor.memoryOverhead=609" /home/$USER/spark/sql_spark.py ${table} ${hivedb} ${domain} ${port} ${mysqldb} ${username} ${password} > /tmp/logging/${table}.log 2>&1
g_STATUS=$?
log_status $g_STATUS "Spark job ${table} Execution"
done < "${args_file}"

echo "************************************************************************************************************************************************************************"

I am able to collect logs for each individual table in the args_file using the above shell script.

Now I have more than 200 tables in mysql. I have modified the pyspark script like below. I have create a function to itreate over the args_file and execute the code.

New spark script

    #!/usr/bin/env python
    import sys
    from pyspark import SparkContext, SparkConf
    from pyspark.sql import HiveContext
    conf = SparkConf()
    sc = SparkContext(conf=conf)
    sqlContext = HiveContext(sc)

    #Condition to specify exact number of arguments in the spark-submit command line
    if len(sys.argv) != 8:
        print "Invalid number of args......"
        print "Usage: spark-submit import.py Arguments"
        exit()
    args_file = sys.argv[1]
    hivedb = sys.argv[2]
    domain = sys.argv[3]
    port=sys.argv[4]
    mysqldb=sys.argv[5]
    username=sys.argv[6]
    password=sys.argv[7]

    def testing(table, hivedb, domain, port, mysqldb, username, password):

        print "*********************************************************table = {} ***************************".format(table)
        df = sqlContext.read.format("jdbc").option("url", "{}:{}/{}".format(domain,port,mysqldb)).option("driver", "com.mysql.jdbc.Driver").option("dbtable","{}".format(table)).option("user", "{}".format(username)).option("password", "{}".format(password)).load()

        #Register dataframe as table
        df.registerTempTable("mytempTable")

        # create hive table from temp table:
        sqlContext.sql("create table {}.{} stored as parquet as select * from mytempTable".format(hivedb,table))

    input = sc.textFile('/user/XXXXXXX/spark_args/%s' %args_file).collect()

    for table in input:
     testing(table, hivedb, domain, port, mysqldb, username, password)

    sc.stop()

Now I want to collect the logs for individual table in args_file. But I am getting only one log file that has the log for all the tables.

How can I achieve my requirement? Or is the method I am doing is completely wrong

> New shell script:

spark-submit --name "${args_file}" --master "yarn-client" --num-executors 2 --executor-memory 6g --executor-cores 1 --conf "spark.yarn.executor.memoryOverhead=609" /home/$USER/spark/sql_spark.py ${table} ${hivedb} ${domain} ${port} ${mysqldb} ${username} ${password} > /tmp/logging/${args_file}.log 2>&

**nilamo** · Aug-28-2017, 07:54 PM

(Aug-28-2017, 06:20 PM)viru Wrote: > /tmp/logging/${args_file}.log 2>&

If you want a different log for each table, why not have the table name in the log file name?

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	time difference bettwenn logs	enkliy	14	963	Nov-21-2023, 04:51 PM Last Post: rob101
	Help creating shell scrip for python file	marciokoko	10	1,346	Sep-16-2023, 09:46 PM Last Post: snippsat
	Is there a .bat DOS batch script to .py Python Script converter?	pstein	3	3,181	Jun-29-2023, 11:57 AM Last Post: gologica
	Launch Python IDLE Shell from terminal	Pavel_47	5	1,216	Feb-17-2023, 02:53 PM Last Post: Pavel_47
	batch file for running python scipt in Windows shell	MaartenRo	2	1,883	Jan-21-2022, 02:36 PM Last Post: MaartenRo
	Bot refuses to count logs.	M1racle	0	1,245	Dec-13-2021, 06:42 PM Last Post: M1racle
	string function doesn't work in script	ClockPillow	3	2,385	Jul-13-2021, 02:47 PM Last Post: deanhystad
	cant use ping, sudo or other commands in remote shell script.	throwaway34	7	3,576	May-17-2021, 11:29 AM Last Post: throwaway34
	Get Azure activity logs using python script	raham3406	4	3,568	Apr-27-2021, 05:10 AM Last Post: raham3406
	Passing flags to python script, through a function	xbit	4	3,960	Apr-20-2021, 06:32 AM Last Post: ndc85430

Collect logs for a function in python using shell script

User Panel Messages

Announcements