Hi All,
I started learning Python few days ago. With google help i started coding to automate a process as described below:
I have a file with some data (n rows and 79 columns), do some processing like filtering, merging, etc. Once it is processed, i have to created a file with 3 excel sheets (2 data and 1 summary sheet) based on the value of another dataframe column (column contains unique email id values). The file name should be the value of the column, i.e., email id.
could someone help to proceed further.
below is the code for your reference.
Regards,
Pavan
I started learning Python few days ago. With google help i started coding to automate a process as described below:
I have a file with some data (n rows and 79 columns), do some processing like filtering, merging, etc. Once it is processed, i have to created a file with 3 excel sheets (2 data and 1 summary sheet) based on the value of another dataframe column (column contains unique email id values). The file name should be the value of the column, i.e., email id.
could someone help to proceed further.
below is the code for your reference.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
import pandas as pd import numpy as np import datetime as dt from dateutil.relativedelta import relativedelta from datetime import date #read data from excel sheet df = pd.read_excel(r 'C:\Users\ppadyala\Desktop\User Wise Testing\Data.xlsx' , sheet_name = 'Raw Data' ) # to assign list of values a variable array = [ "Query" , "Approval" ] # to filter based on multiple values in a column df = df.loc[df[ 'Current Major Status' ].isin(array)] df[ 'Submitted for Approval' ] = df[ 'Submitted for Approval' ]. str .replace( ' CET' , '') #convert from object to datetime64[ns] df[ 'Submitted for Approval' ] = pd.to_datetime(df[ 'Submitted for Approval' ]) # calculate receipt aging df[ 'Receipt Aging' ] = (df[ 'Today' ] - df[ 'Created Date.1' ]).dt.days # replace null with submitted for approval df[ 'Last Query Raised On.1' ] = df[ 'Last Query Raised On.1' ].fillna(df[ 'Submitted for Approval' ]) # replace null with created date df[ 'Last Query Raised On.1' ] = df[ 'Last Query Raised On.1' ].fillna(df[ 'Created Date.1' ]) # calculate exception aging df[ 'Exception Aging' ] = (df[ 'Today' ] - df[ 'Last Query Raised On.1' ]).dt.days #create a list of conditions conditions = [ (df[ 'Exception Aging' ] < 6 ), (df[ 'Exception Aging' ] < 11 ), (df[ 'Exception Aging' ] < 21 ), (df[ 'Exception Aging' ] < 31 ), (df[ 'Exception Aging' ] > = 31 ), ] # create a list of the values we want to assign for each condition values = [ '0-5 Days' , '6-10 Days' , '11-20 Days' , '21-30 Days' , '30+ Days' ] # create a new column and use np.select to assign values to it using our lists as arguments df[ 'Exception Aging Bucket' ] = np.select(conditions, values) # read client email id's list clientlist = pd.read_excel(r 'C:\Users\ppadyala\Desktop\User Wise Testing\User List.xlsx' ) # do vlookup with client list df_new = pd.merge(df, clientlist, on = 'Task Owner Full Name' , how = 'left' ) # replace null with email id df_new[ 'Latest External Query Resolver' ] = df_new[ 'Latest External Query Resolver' ].fillna(df_new[ 'mail' ]) # find the missing latest external query resolver missingLEQR = pd.isnull(df_new[ 'Latest External Query Resolver' ]) # Assign the missing latest external query resolver to a variable blankLEQR = df_new[missingLEQR] # write the missing ones to excel blankLEQR.to_excel( 'blankLEQR.xlsx' , index = False ) #remove mail column df_new = df_new.drop([ 'mail' ], axis = 1 ) #remove blank rows from specific column df_new = df_new.dropna(subset = [ 'Latest External Query Resolver' ]) #sort the table based on leqr df_new.sort_values(by = [ 'Latest External Query Resolver' ]) # copy leqr to new df leqr = df_new[ 'Latest External Query Resolver' ] # remove duplicates from leqr leqr = leqr.drop_duplicates() |
Pavan