Python Forum
Filter data based on a value from another dataframe column and create a file using lo
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Filter data based on a value from another dataframe column and create a file using lo
#1
Hi All,

I started learning Python few days ago. With google help i started coding to automate a process as described below:

I have a file with some data (n rows and 79 columns), do some processing like filtering, merging, etc. Once it is processed, i have to created a file with 3 excel sheets (2 data and 1 summary sheet) based on the value of another dataframe column (column contains unique email id values). The file name should be the value of the column, i.e., email id.

could someone help to proceed further.

below is the code for your reference.

import pandas as pd
import numpy as np
import datetime as dt
from dateutil.relativedelta import relativedelta
from datetime import date

#read data from excel sheet
df = pd.read_excel(r'C:\Users\ppadyala\Desktop\User Wise Testing\Data.xlsx', sheet_name='Raw Data') 

# to assign list of values a variable
array = ["Query", "Approval"]

# to filter based on multiple values in a column
df = df.loc[df['Current Major Status'].isin(array)]
df['Submitted for Approval'] = df['Submitted for Approval'].str.replace(' CET', '')

#convert from object to datetime64[ns]
df['Submitted for Approval'] = pd.to_datetime(df['Submitted for Approval']) 

# calculate receipt aging
df['Receipt Aging'] = (df['Today'] - df['Created Date.1']).dt.days 

# replace null with submitted for approval
df['Last Query Raised On.1'] = df['Last Query Raised On.1'].fillna(df['Submitted for Approval']) 

# replace null with created date
df['Last Query Raised On.1'] = df['Last Query Raised On.1'].fillna(df['Created Date.1'])

# calculate exception aging
df['Exception Aging'] = (df['Today'] - df['Last Query Raised On.1']).dt.days

#create a list of conditions
conditions = [
    (df['Exception Aging'] < 6),
    (df['Exception Aging'] < 11),
    (df['Exception Aging'] < 21),
    (df['Exception Aging'] < 31),
    (df['Exception Aging'] >= 31),
]

# create a list of the values we want to assign for each condition
values = ['0-5 Days', '6-10 Days', '11-20 Days', '21-30 Days', '30+ Days']

# create a new column and use np.select to assign values to it using our lists as arguments
df['Exception Aging Bucket'] = np.select(conditions, values)

# read client email id's list
clientlist = pd.read_excel(r'C:\Users\ppadyala\Desktop\User Wise Testing\User List.xlsx')

# do vlookup with client list
df_new = pd.merge(df, clientlist, on = 'Task Owner Full Name', how='left')

# replace null with email id
df_new['Latest External Query Resolver'] = df_new['Latest External Query Resolver'].fillna(df_new['mail'])

# find the missing latest external query resolver
missingLEQR = pd.isnull(df_new['Latest External Query Resolver'])

# Assign the missing latest external query resolver to a variable
blankLEQR = df_new[missingLEQR]

# write the missing ones to excel
blankLEQR.to_excel('blankLEQR.xlsx', index = False)

#remove mail column
df_new = df_new.drop(['mail'], axis = 1)

#remove blank rows from specific column
df_new = df_new.dropna(subset=['Latest External Query Resolver'])

#sort the table based on leqr
df_new.sort_values(by=['Latest External Query Resolver'])

# copy leqr to new df
leqr = df_new['Latest External Query Resolver']

# remove duplicates from leqr
leqr = leqr.drop_duplicates()
Regards,
Pavan
Reply
#2
Hi All,

Just wondering if anyone has got any luck on the above requirement.

Regards,
Pavan
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  concat 3 columns of dataframe to one column flash77 2 778 Oct-03-2023, 09:29 PM
Last Post: flash77
  HTML Decoder pandas dataframe column mbrown009 3 962 Sep-29-2023, 05:56 PM
Last Post: deanhystad
  Supervised learning, tree based model - problems splitting data Pixel 0 644 May-16-2023, 05:25 PM
Last Post: Pixel
  attempt to split values from within a dataframe column mbrown009 8 2,224 Apr-10-2023, 02:06 AM
Last Post: mbrown009
  Grouping Data based on 30% bracket purnima1 0 940 Feb-16-2023, 07:14 PM
Last Post: purnima1
  Finding the median of a column in a huge CSV file markagregory 5 1,733 Jan-24-2023, 04:22 PM
Last Post: DeaD_EyE
  Make unique id in vectorized way based on text data column with similarity scoring ill8 0 862 Dec-12-2022, 03:22 AM
Last Post: ill8
  Create a Table to a PNG file amanajosh 0 1,070 Aug-11-2022, 02:54 AM
Last Post: amanajosh
  How to insert data in a dataframe? man0s 1 1,313 Apr-26-2022, 11:36 PM
Last Post: jefsummers
  export dataframe to file.txt dramauh 5 1,885 Apr-21-2022, 01:23 AM
Last Post: sarahroxon7

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020