Python Forum - Filtering Excel Document Data Based On Numerical Values

Pages: 1 2 3 4

Hi there,

After some research on the Internet, finding some useful Python Code for Dates, I now have the following end bit of the Python Code :-

#return df['Durn'] column datetime64 format to object type
df['Durn'] = pd.to_datetime(df['Durn'], format='%H:%M:%S').dt.time
#print("reverted datatype of column Durn back to ----->",df['Durn'].dtype)
#print("=======")
 
#print("\n\n*** FINAL RESULTSET ***\n\n")
 
df['Date']= pd.to_datetime(df['Date'],format='%Y-%m-%d')
#df['Date']= pd.to_datetime(df['Date']).dt.strftime('%d-%m-%Y')
##added two lines above to convert date format    

df['Date'] = df['Date'].mask(df['Date'].dt.year == 2008, 
                             df['Date'] + pd.offsets.DateOffset(year=2009)) 

df=df.loc[df.Date.dt.strftime('%m%d').astype(int).argsort()]    

df['Date']= pd.to_datetime(df['Date']).dt.strftime('%d-%m-%Y')
    
pd.DataFrame(df)

So that now I have the Dates, in the Format I want, saying the correct Year 2009, instead of 2008.

And sorted by Date descending, Earliest Date to Latest, in the Output I get, when I run the Code.

Now what will I need to add, to the Code, so that the Rows not showing, due to the non matching values in the 'Durn' Column, do show and have NO showing in the 'Dakota And Fighters In The Same Display Slot' Column, for those Rows ? For the matching values, in comparelist=[5,10,15,20,] if they match they occupy both rows, the same Value, so 5+5 = 10, 10+10 = 20, 15+15 = 30, 20+20 = 40.

Whereas for those that don't match, i.e. 5 in one row and 10 in the other, 5+10 = 15. How would I write Code, to achieve what I am getting at ? If you look at the .xls File document, you will see what I mean, the Link to that .xls File, is in my number 19 Post. Any help would be appreciated by you Sandeep, or someone else.

Best Regards

Eddie Winch

Can anyone help me ? Is it fairly straightforward, what I am trying to achieve here ?

Regards

Eddie

Hi there,

Could a Moderator move, my following Thread, to the Jobs Forum Section, from the Data Science Forum Section ?

I would like help, to finish My Python Code, but nobody has answered, my recent questions.

So I would like to pay someone, to help finish my Code off, if that is okay ? Unless someone, is able and willing to help me ?

Here is the Thread Link :-

https://python-forum.io/Thread-Filtering...cal-Values

Best Regards

Eddie Winch

I moved the thread but I think you should clearly say in the thread that you are ready to pay someone to finish the program.

Hi there,

If someone could sort out, what I want to achieve in my Code, by reading all 3 pages, of this Thread ? Could this be sorted out, fairly cheaply, for minimal cost ? I am willing to pay someone, to help finish of my Code.

Here is the full Python Code :-

import pandas as pd
       
xls = pd.ExcelFile(r'C:\Users\Edward\Desktop\BBMF Schedules And Master Forum Thread Texts\BBMF Display Schedule 2009.xls')   
data = pd.read_excel(xls, sheet_name="Sheet1")      
pd.options.display.max_rows = 1000   
pd.options.display.max_columns = 1000
df = pd.DataFrame(data, columns= ['Venue','A/C','DISPLAY/','Date','BID','Durn'])
#df[(df['Venue'].str.contains('[a-zA-Z]') & (df['DISPLAY/'].str.contains('DISPLAY') & df['A/C'].str.contains("DHS|DAK|HS|SPIT")) & (df['A/C'] != 'LHS') & (df['A/C'] != 'LANC'))] 
#df["Date"].fillna("No Date", inplace = True)   
df["Date"].fillna(method='ffill', inplace = True)
df['A/C'].unique().tolist()  
rename_map = {
    'DAK': 'D',
    'SPIT': 'S',
    'LANC': 'L',
    'HURRI': 'H',
    'PARA': 'P'
}
df['A/C std'] = df['A/C'].replace(rename_map)
print(df['A/C std'].unique().tolist())
print("=======")  
selected = df.loc[df['DISPLAY/'] == 'DISPLAY']
#selected = df.loc[df['DISPLAY/'] == 'FLYPAST']
#selected = df.loc[df['DISPLAY/'].isin(['DISPLAY', 'FLYPAST'])]
#selected = df.loc[df['A/C'].isin(['DS', 'DH', 'DHS', 'SD', 'HSD'])]
groupby_venue_date = selected.groupby(['Venue', 'BID', 'Date', 'DISPLAY/', 'Durn']) 
aircraft = groupby_venue_date['A/C std'].apply(''.join).rename('Aircraft-combined')
def sorted_string(s):
    return ''.join(sorted(s))   
def remove_duplicate_chars(s):
    return ''.join(set(s))
  
  
#####################
#####################
#####################
#####################
  
  
df=pd.DataFrame(aircraft)
#print(df) #original df
    
   
indexNames = df[~df['Aircraft-combined'].str.contains('D',na=False)].index
df.drop(indexNames , inplace=True)
#print(df)#prints all rows having D and removes remaining rows
  
df['Aircraft-combined']=df['Aircraft-combined'].replace(to_replace='SD', value='DS')
df['Aircraft-combined']=df['Aircraft-combined'].replace(to_replace=['HSD','HDS', 'SHSD'], value='DHS')
#print(df)#replaces the SD with DS and HSD,HDS and SHSD with DHS
    
 
   
df = df.drop(df[df['Aircraft-combined'] == 'D'].index)#remove value that match letter D
df = df.drop(df[df['Aircraft-combined'] == 'DD'].index)
df = df.drop(df[df['Aircraft-combined'] == 'DL'].index)
df = df.drop(df[df['Aircraft-combined'] == 'LD'].index)
df = df.drop(df[df['Aircraft-combined'] == 'LDS'].index)
df = df.drop(df[df['Aircraft-combined'] == 'LSSD'].index)
#df = df.drop(df[df['Aircraft-combined'] != 'D'].index)#if need the rows that match only letter D in selected dataframe column
 
 
 
#column names before alignment
#print("column names: before change  ", list(df.columns.values.tolist()))
#print("=======")
#print(df)
#print("=======")  
  
 
#to fix column name alignment 
df.to_csv(r'C:\Users\Edward\Desktop\BBMF Schedules And Master Forum Thread Texts\tempfile.csv')
df = pd.read_csv(r'C:\Users\Edward\Desktop\BBMF Schedules And Master Forum Thread Texts\tempfile.csv',names=['Venue','BID','Date','DISPLAY/','Durn','Aircraft-combined'], header=0 )
 
 
#column names before alignment
#print("column names:  after change ",list(df.columns.values.tolist()))
#print("=======")
#print(df)
#print("=======")
 
#convert df['Durn'] column to datetime foramt
df['Durn']= pd.to_datetime(df['Durn'],format='%H:%M:%S').apply(pd.Timestamp)
#print("changing datatype of column Durn----->",df['Durn'].dtype)
#print("=======")
#compare minutes with extracting minutes from datetime64 column 
comparelist=[5,10,15,20,]
df['Dakota And Fighters In The Same Display Slot'] = ((df['DISPLAY/'] == 'DISPLAY') & (df['Durn'].dt.minute.isin(comparelist))).map({True:'YES', False:'NO'})
#print(df)
#print("applied... the condition to compare minutes for Durn column")
#print("=======")
 
 
#return df['Durn'] column datetime64 format to object type
df['Durn'] = pd.to_datetime(df['Durn'], format='%H:%M:%S').dt.time
#print("reverted datatype of column Durn back to ----->",df['Durn'].dtype)
#print("=======")
 
#print("\n\n*** FINAL RESULTSET ***\n\n")
 
df['Date']= pd.to_datetime(df['Date'],format='%Y-%m-%d')
#df['Date']= pd.to_datetime(df['Date']).dt.strftime('%d-%m-%Y')
##added two lines above to convert date format    

df['Date'] = df['Date'].mask(df['Date'].dt.year == 2008, 
                             df['Date'] + pd.offsets.DateOffset(year=2009)) 

df=df.loc[df.Date.dt.strftime('%m%d').astype(int).argsort()]    

df['Date']= pd.to_datetime(df['Date']).dt.strftime('%d-%m-%Y')
    
pd.DataFrame(df)

#print(df)    #final resultset
#print("=======")

Also I would like to Backfill, NaN Values For a given Column, i.e. in this case 'Date', and for a Particular Date from that Column.

I have allready Forward Filled for missing Dates, in my DataFrame in Pandas. I am working with an Excel .xls File , in my DataFrame in Pandas, using Jupyter Notebook.

This is the Line Of Code I used :-

df["Date"].fillna(method='ffill', inplace = True)

But I would like to narrow that down to a particular Date, this time back filling.

Could someone tell me, what I should type to do that ?

Any help would be appreciated, with this Code.

Best Regards

Eddie Winch

To better explain, The Rows not showing, due to the non matching values in the 'Durn' Column, I also would like showing in the DataFrame Output, and have NO showing in the 'Dakota And Fighters In The Same Display Slot' Column for those rows, for those Rows ? For the matching values, in comparelist=[5,10,15,20,] if they match for a given 'Venue', from the Venue Column, there are two values, in the two cells, in the Excel Document, two matching values, so 5+5 = 10, 10+10 = 20, 15+15 = 30, 20+20 = 40.

Whereas for those that don't match, i.e. 5 in one row and 10 in the other, 5+10 = 15. How would I write Code, to achieve what I am getting at, i.e. to use Maths in Code, to get the necessary DataFrame Output ? Or can Code, be written, saying to also show rows, where the two values arn't matching ? If someone could show both ways, If you look at the .xls File document, you will see what I mean, the Link to that .xls File, is in my number 19 Post.

Any help would be appreciated

Best Regards

Eddie Winch

If someone could PM me, using the Messaging System, if you are willing to help me ? Or send me an Email, to [email protected] letting me know how much the Job, could be done for etc ?

Best Regards

Eddie Winch

Hi there,

I have the following SQL Lines of Code :-

SELECT * INTO #Temp4
FROM
(
SELECT
tt5.[Venue],
tt5.[BID],
MIN(tt1.[Date]) AS [Date],
tt5.[Display / Flypast],
SUM(tt1.[Duration]) AS [Duration (mins)],
(SELECT DISTINCT '; ' + t5.[Aircraft Combined]
    FROM #Temp5 t5
    WHERE tt5.[Venue] = t5.[Venue]
    AND tt5.[BID] = t5.[BID]
    AND tt5.[Display / Flypast] = t5.[Display / Flypast]
    FOR XML PATH ('')) [Aircraft Combined],
SUM(tt5.[Dakota and Fighters in Same Slot?]) AS [Dakota and Fighters in Same Slot?]

Those lines of SQL Code I posted, are significant, because it deals, with adding of two particular Numerical values together, which I wan't to do in the Python Pandas Code, I have at the moment. I go into more detail about this, in earlier Posts, in this Thread, I run the Code in Jupyter Notebook.

So if someone could tell me what the Python Code, equivalent would be, for those lines of SQL Code. I could change my current Code hopefully, to achieve the Output DataFrame I want.

I really need help with this, would someone be able to help me ?

Regards

Eddie Winch ))

Hi @eddywinch82

I interested in this thread and made the result.

What I did is: convert origin 'Durn' DateTime to milliseconds -> Sum -> convert to DateTime

Check and let me know if this result is perfect.
[Image: open?id=1lKA1Wsan1Jtzd3CFOXYXf6a4yvgUbGxK]

[Image: open?id=1lKA1Wsan1Jtzd3CFOXYXf6a4yvgUbGxK]

Hi GodMaster,

Thankyou very much, for your interest in this Thread, and taking the time to attempt, to achieve what I
want, thats very much appreciated.

I have had help by a Python FreeLancer on the Internet, and he has sorted out what I want to achieve. Here below is the Finished Code :-

import pandas as pd
        
data = pd.read_csv(r'C:\Users\Edward\Desktop\BBMF Schedules And Master Forum Thread Texts\BBMF-2009-Code.csv')      
pd.options.display.max_rows = 1000   
pd.options.display.max_columns = 1000

df = pd.DataFrame(data, columns= ['Venue','A/C','DISPLAY/','Date','BID','Durn'])
#df[(df['Venue'].str.contains('[a-zA-Z]') & (df['DISPLAY/'].str.contains('DISPLAY') & df['A/C'].str.contains("DHS|DAK|HS|SPIT")) & (df['A/C'] != 'LHS') & (df['A/C'] != 'LANC'))] 
#df["Date"].fillna("No Date", inplace = True)   
df["Date"].fillna(method='ffill', inplace = True)
df['A/C'].unique().tolist()  
rename_map = {
    'DAK': 'D',
    'SPIT': 'S',
    'LANC': 'L',
    'HURRI': 'H',
    'PARA': 'P'
}
df['A/C std'] = df['A/C'].replace(rename_map)
print(df['A/C std'].unique().tolist())
print("=======")  
selected = df.loc[df['DISPLAY/'] == 'DISPLAY']
#selected = df.loc[df['DISPLAY/'] == 'FLYPAST']
#selected = df.loc[df['DISPLAY/'].isin(['DISPLAY', 'FLYPAST'])]
#selected = df.loc[df['A/C'].isin(['DS', 'DH', 'DHS', 'SD', 'HSD'])]

df1=df
groupby_venue_date = selected.groupby(['Venue', 'BID', 'Date', 'DISPLAY/']) 
aircraft = groupby_venue_date['A/C std'].apply(''.join).rename('Aircraft-combined')
def sorted_string(s):
    return ''.join(sorted(s))   
def remove_duplicate_chars(s):
    return ''.join(set(s))
   
   
#####################
#####################
#####################
#####################
   
   
df=pd.DataFrame(aircraft)
#print(df) #original df
     
    
indexNames = df[~df['Aircraft-combined'].str.contains('D',na=False)].index
df.drop(indexNames , inplace=True)
#print(df)#prints all rows having D and removes remaining rows
   
df['Aircraft-combined']=df['Aircraft-combined'].replace(to_replace='SD', value='DS')
df['Aircraft-combined']=df['Aircraft-combined'].replace(to_replace=['HSD','HDS', 'SHSD'], value='DHS')
#print(df)#replaces the SD with DS and HSD,HDS and SHSD with DHS
     
  
    
df = df.drop(df[df['Aircraft-combined'] == 'D'].index)#remove value that match letter D
df = df.drop(df[df['Aircraft-combined'] == 'DD'].index)
df = df.drop(df[df['Aircraft-combined'] == 'DL'].index)
df = df.drop(df[df['Aircraft-combined'] == 'LD'].index)
df = df.drop(df[df['Aircraft-combined'] == 'LDS'].index)
df = df.drop(df[df['Aircraft-combined'] == 'LSSD'].index)
#df = df.drop(df[df['Aircraft-combined'] != 'D'].index)#if need the rows that match only letter D in selected dataframe column
  
  
  
#column names before alignment
#print("column names: before change  ", list(df.columns.values.tolist()))
#print("=======")
#print(df)
#print("=======")  
   
  
#to fix column name alignment 
df.to_csv(r'C:\Users\Edward\Desktop\BBMF Schedules And Master Forum Thread Texts\tempfile.csv')
df = pd.read_csv(r'C:\Users\Edward\Desktop\BBMF Schedules And Master Forum Thread Texts\tempfile.csv',names=['Venue','BID','Date','DISPLAY/','Aircraft-combined'], header=0 )
  
  
#column names before alignment
#print("column names:  after change ",list(df.columns.values.tolist()))
#print("=======")
#print(df)
#print("=======")


###My code Starts here

x,y,z=df['Venue'],df['BID'],df['Date']
x=list(x)
y=list(y)
z=list(z)
import numpy as np

h=np.array(y)
np.unique(h)

yy=[]
for jj in y:
    yy.append(str(jj))

t= df1['Venue'].isin(x)
df2=df1[t]
df2

g=df2['Date'].isin(z)
df3=df2[g]
df3=pd.DataFrame(df3)

df3

g=df3['BID'].isin(yy)
df4=df3[g]
df4=pd.DataFrame(df4)


df4

column_names = ['Venue','A/C','DISPLAY/','Date','BID','Durn','Aircraft-combined']
dff = pd.DataFrame(columns = column_names)
dff.to_csv(r'C:\Users\Edward\Desktop\BBMF Schedules And Master Forum Thread Texts\tempfile.csv')
dff = pd.read_csv(r'C:\Users\Edward\Desktop\BBMF Schedules And Master Forum Thread Texts\tempfile.csv',names=['Venue','A/C','DISPLAY/','Date','BID','Durn','Aircraft-combined'], header=0 )
df4
df4.to_csv(r'C:\Users\Edward\Desktop\BBMF Schedules And Master Forum Thread Texts\tempfile.csv')
df4 = pd.read_csv(r'C:\Users\Edward\Desktop\BBMF Schedules And Master Forum Thread Texts\tempfile.csv',names=['Venue','A/C','DISPLAY/','Date','BID','Durn','Aircraft-combined'], header=0 )
for x in range(0,len(df4)):
    y=x+1
    try:
        if ((((df4.iloc[x][6]))=='DSH') or (((df4.iloc[x][6]))=='SD') or (((df4.iloc[x][6]))=='DS') or (((df4.iloc[x][6]))=='HSD') or (((df4.iloc[x][6]))=='HDS') or (((df4.iloc[x][6]))=='SHSD') or (((df4.iloc[x][6]))=='DHS')):
            dff3 ={'Venue':df4.iloc[x][0],'A/C':df4.iloc[x][1],'DISPLAY/':df4.iloc[x][2],'Date':df4.iloc[x][3],'BID':df4.iloc[x][4],'Durn':df4.iloc[x][5],'Aircraft-combined':df4.iloc[x][6]}
            dff = dff.append(dff3,ignore_index=True)
            
            
        elif ((df4.iloc[x][3]==df4.iloc[y][3]) and (df4.iloc[x][4]==df4.iloc[y][4])):
            sidhu=df4.iloc[x][6]+df4.iloc[y][6]
            dff1 = {'Venue':df4.iloc[x][0],'A/C':df4.iloc[x][1],'DISPLAY/':df4.iloc[x][2],'Date':df4.iloc[x][3],'BID':df4.iloc[x][4],'Durn':df4.iloc[x][5],'Aircraft-combined':sidhu}
            dff2 = {'Venue':df4.iloc[y][0],'A/C':df4.iloc[y][1],'DISPLAY/':df4.iloc[y][2],'Date':df4.iloc[y][3],'BID':df4.iloc[y][4],'Durn':df4.iloc[y][5],'Aircraft-combined':sidhu}
            dff = dff.append(dff1,ignore_index=True)
            dff = dff.append(dff2,ignore_index=True)
            
        
            
             
        else:
            continue
    except Exception:
        break
        
dff['Aircraft-combined']=dff['Aircraft-combined'].replace(to_replace='SD', value='DS')
dff['Aircraft-combined']=dff['Aircraft-combined'].replace(to_replace=['HSD','HDS', 'SHSD'], value='DHS')
dff['Dakota And Fighters In The Same Display Slot']='Yes'


for x in range(0,len(dff)):
    y=x+1
    try:
        if ((dff.iloc[x][3]==dff.iloc[y][3]) and (dff.iloc[x][4]==dff.iloc[y][4])):
            if dff.iloc[x][5]==dff.iloc[y][5]:
                dff['Dakota And Fighters In The Same Display Slot'][x]='Yes'
                dff['Dakota And Fighters In The Same Display Slot'][y]='Yes'
                
            else:
                dff['Dakota And Fighters In The Same Display Slot'][x]='No'
                dff['Dakota And Fighters In The Same Display Slot'][y]='No'
                
                
            

            
        
            
             
        else:
            continue
    except Exception:
        break
dff=dff.drop('A/C',axis=1)
dff=dff.drop_duplicates(subset=['Venue','Date','BID','Aircraft-combined','Dakota And Fighters In The Same Display Slot'], keep='first', inplace=False)
dff=dff.drop_duplicates(subset=None, keep='first', inplace=False)

dff['Durn']= pd.to_datetime(dff['Durn'],format='%M').apply(pd.Timestamp)
dff['Durn'] = pd.to_datetime(dff['Durn'], format='%H:%M:%S').dt.time

dff.at[14,'Aircraft-combined']='DHS'
dff.at[18,'Aircraft-combined']='DHS'
dff.to_csv(r'C:\Users\Edward\Desktop\BBMF Schedules And Master Forum Thread Texts\BBMF-2009-Code.csv')
dff

The Code seemed to run initially, but now I get the following TypeError :-

Error:TypeError                                 Traceback (most recent call last)
<ipython-input-58-43d53992dab5> in <module>
     17     'PARA': 'P'
     18 }
---> 19 df['A/C std'] = df['A/C'].replace(rename_map)
     20 print(df['A/C std'].unique().tolist())
     21 print("=======")

c:\python37\lib\site-packages\pandas\core\series.py in replace(self, to_replace, value, inplace, limit, regex, method)
   3843         return super(Series, self).replace(to_replace=to_replace, value=value,
   3844                                            inplace=inplace, limit=limit,
-> 3845                                            regex=regex, method=method)
   3846 
   3847     @Appender(generic._shared_docs['shift'] % _shared_doc_kwargs)

c:\python37\lib\site-packages\pandas\core\generic.py in replace(self, to_replace, value, inplace, limit, regex, method)
   6496 
   6497             return self.replace(to_replace, value, inplace=inplace,
-> 6498                                 limit=limit, regex=regex)
   6499         else:
   6500 

c:\python37\lib\site-packages\pandas\core\series.py in replace(self, to_replace, value, inplace, limit, regex, method)
   3843         return super(Series, self).replace(to_replace=to_replace, value=value,
   3844                                            inplace=inplace, limit=limit,
-> 3845                                            regex=regex, method=method)
   3846 
   3847     @Appender(generic._shared_docs['shift'] % _shared_doc_kwargs)

c:\python37\lib\site-packages\pandas\core\generic.py in replace(self, to_replace, value, inplace, limit, regex, method)
   6545                                                        dest_list=value,
   6546                                                        inplace=inplace,
-> 6547                                                        regex=regex)
   6548 
   6549                 else:  # [NA, ''] -> 0

c:\python37\lib\site-packages\pandas\core\internals\managers.py in replace_list(self, src_list, dest_list, inplace, regex)
    557             return _compare_or_regex_search(values, s, regex)
    558 
--> 559         masks = [comp(s, regex) for i, s in enumerate(src_list)]
    560 
    561         result_blocks = []

c:\python37\lib\site-packages\pandas\core\internals\managers.py in <listcomp>(.0)
    557             return _compare_or_regex_search(values, s, regex)
    558 
--> 559         masks = [comp(s, regex) for i, s in enumerate(src_list)]
    560 
    561         result_blocks = []

c:\python37\lib\site-packages\pandas\core\internals\managers.py in comp(s, regex)
    555                 return _compare_or_regex_search(maybe_convert_objects(values),
    556                                                 getattr(s, 'asm8'), regex)
--> 557             return _compare_or_regex_search(values, s, regex)
    558 
    559         masks = [comp(s, regex) for i, s in enumerate(src_list)]

c:\python37\lib\site-packages\pandas\core\internals\managers.py in _compare_or_regex_search(a, b, regex)
   1949         raise TypeError(
   1950             "Cannot compare types {a!r} and {b!r}".format(a=type_names[0],
-> 1951                                                           b=type_names[1]))
   1952     return result
   1953 

TypeError: Cannot compare types 'ndarray(dtype=float64)' and 'str'

I can't work out, what the issue is, any ideas ?

Regards

Eddie Winch ))

Hi @eddywinch82

I'm glad to hear that you already solve an issue.
I reviewed code and I think, above solution is a hit.
I'm willing to help your new type error issue but I can't notice anything on my side, just copied full code.
Send me your import CSV file so that I can test it again on my local env.

Pages: 1 2 3 4