Python Forum
Filtering Excel Document Data Based On Numerical Values
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Filtering Excel Document Data Based On Numerical Values
#11
try add below line as needed and see if that helps,

df = df.drop(df[df['Aircraft-combined'] == 'D'].index)#remove rows that match letter D in selected dataframe column
df = df.drop(df[df['Aircraft-combined'] != 'D'].index)#if need the rows that match only letter D in selected dataframe column

import pandas as pd
    
xls = pd.ExcelFile(r'C:\Users\testuser\Downloads\DL\book12.xls')
    
data = pd.read_excel(xls, sheet_name="Sheet1")
      
pd.options.display.max_rows = 1000
  
df = pd.DataFrame(data, columns= ['Venue','A/C','DISPLAY/','Date','BID'])
 
#df[(df['Venue'].str.contains('[a-zA-Z]') & (df['DISPLAY/'].str.contains('DISPLAY') & df['A/C'].str.contains("DHS|DAK|HS|SPIT")) & (df['A/C'] != 'LHS') & (df['A/C'] != 'LANC'))] 
df["Date"].fillna("No Date", inplace = True)
  
df['A/C'].unique().tolist()
  
rename_map = {
    'DAK': 'D',
    'SPIT': 'S',
    'LANC': 'L',
    'HURRI': 'H',
    'PARA': 'P'
}
df['A/C std'] = df['A/C'].replace(rename_map)
print(df['A/C std'].unique().tolist())
  
selected = df.loc[df['DISPLAY/'] == 'DISPLAY']
#selected = df.loc[df['A/C'].isin(['DS', 'DH', 'DHS', 'SD', 'HSD'])]
groupby_venue_date = selected.groupby(['Venue', 'BID', 'Date', 'DISPLAY/']) 
aircraft = groupby_venue_date['A/C std'].apply(''.join).rename('Aircraft-combined')
def sorted_string(s):
    return ''.join(sorted(s))
  
def remove_duplicate_chars(s):
    return ''.join(set(s))
  
 
#added lines from here
print("****************")
 
 
df=pd.DataFrame(aircraft)
#print(df) #original df
 
print("*****************")
 
indexNames = df[~df['Aircraft-combined'].str.contains('D',na=False)].index
df.drop(indexNames , inplace=True)
#print(df)#prints all rows having D and removes remaining rows
 
print("*****************")
 
df['Aircraft-combined']=df['Aircraft-combined'].replace(to_replace='SD', value='DS')
df['Aircraft-combined']=df['Aircraft-combined'].replace(to_replace=['HSD','HDS'], value='DHS')
#print(df)#replaces the SD with DS and HSD/HDS with DHS
 
print("*********************")

df = df.drop(df[df['Aircraft-combined'] == 'D'].index)#remove value that match letter D
#df = df.drop(df[df['Aircraft-combined'] != 'D'].index)#if need the rows that match only letter D in selected dataframe column
print(df) 

print("*********************")
Output:
python test1.py [nan, 'L', 'S', 'H', 'LHS', 'D', 'HS', 'P', 'LSSD', 'LSS', 'SS', 'LH', 'DH', 'DHS', 'SSSHH'] **************** ***************** ***************** ********************* Aircraft-combined Venue BID Date DISPLAY/ ANNAN 606 No Date DISPLAY DS COTTESMORE 757 No Date DISPLAY DS EAST FORTUNE 406 No Date DISPLAY DHS GUERNSEY 582 2008-09-10 00:00:00 DISPLAY DHS JERSEY 581 2008-09-10 00:00:00 DISPLAY DHS KEMBLE 316 2008-06-21 00:00:00 DISPLAY DS No Date DISPLAY DHS PORTRUSH 76 2008-09-05 00:00:00 DISPLAY DHS 2008-09-06 00:00:00 DISPLAY DHS ROMSEY 175 No Date DISPLAY DHS ROUGHAM 551 No Date DISPLAY DL SANICOLE 69 No Date DISPLAY DHS SCARBOROUGH 610 2008-05-25 00:00:00 DISPLAY DS SHAWBURY 188 2008-07-24 00:00:00 DISPLAY DS SOUTHPORT 584 No Date DISPLAY DHS SUNDERLAND 333 2008-07-25 00:00:00 DISPLAY DH No Date DISPLAY SHSD TARRANT RUSHDEN 132 No Date DISPLAY DHS WHITBY 125 2008-08-22 00:00:00 DISPLAY DHS 2008-08-23 00:00:00 DISPLAY DHS WIGTON 144 2008-07-26 00:00:00 DISPLAY DHS WINDERMERE 138 2008-07-25 00:00:00 DISPLAY DHS 2008-07-26 00:00:00 DISPLAY DHS *********************
Best Regards,
Sandeep

GANGA SANDEEP KUMAR
Reply
#12
Wow, thankyou so much Sandeep, for all your efforts, I very much appreciate your help.

How do I get that Data, shown in the Output part, of your last reply. Displayed in the Style and Font, the Data shown in the Photo is ? From the Website Link below ?

I.e. in that Font, with that same Data, rather than in the plane text ?

Is that possible ? i.e. in the proper DataFrame Font ?

Here is a link, to the picture of what I mean :-

https://www.flickr.com/photos/[email protected]/

Also what should be typed to change, all the 2008 part, of the dates that show, in the date column, to 2009 i.e. from

2008-07-24 to 2009-07-24. And all the dates, that show 1999 as the year, also changed to 2009 ? And how could they all be changed, to the order Day-Month-Year i.e. 24-07-2009 ?

I want that done, as the BBMF Schedule, is from 2009 not 2008.

And what should be typed in the Code, to get the Data, to display in the order it was in, in the original Excel File.

I say that because most if not all the Flypast appearances, don't show the date. So when the venues are shown, in alphabetical order, it makes it harder to know if the Flypast, was associated with a relevant display, i.e. when the aircraft were transiting to the display etc. If the Data is in the Original, non alphabetical order for the Venues, the relevant Flypasts should be underneath, I think, I may be wrong though.

To have data shown, for both Displays and Flypasts, I use the following Code :-

selected = df.loc[df['DISPLAY/'].isin(['DISPLAY', 'FLYPAST'])]
Update :-

My latest Code is the following :-

import pandas as pd
     
xls = pd.ExcelFile(r'C:\Users\Edward\Desktop\BBMF Schedules And Master Forum Thread Texts\BBMF Display Schedule 2009.xls')
     
data = pd.read_excel(xls, sheet_name="Sheet1")
       
pd.options.display.max_rows = 1000
   
df = pd.DataFrame(data, columns= ['Venue','A/C','DISPLAY/','Date','BID','Durn'])
  
#df[(df['Venue'].str.contains('[a-zA-Z]') & (df['DISPLAY/'].str.contains('DISPLAY') & df['A/C'].str.contains("DHS|DAK|HS|SPIT")) & (df['A/C'] != 'LHS') & (df['A/C'] != 'LANC'))] 
df["Date"].fillna("No Date", inplace = True)
   
df['A/C'].unique().tolist()
   
rename_map = {
    'DAK': 'D',
    'SPIT': 'S',
    'LANC': 'L',
    'HURRI': 'H',
    'PARA': 'P'
}
df['A/C std'] = df['A/C'].replace(rename_map)
print(df['A/C std'].unique().tolist())
   
selected = df.loc[df['DISPLAY/'] == 'DISPLAY']
#selected = df.loc[df['DISPLAY/'] == 'FLYPAST']
#selected = df.loc[df['DISPLAY/'].isin(['DISPLAY', 'FLYPAST'])]
#selected = df.loc[df['A/C'].isin(['DS', 'DH', 'DHS', 'SD', 'HSD'])]
groupby_venue_date = selected.groupby(['Venue', 'BID', 'Date', 'DISPLAY/', 'Durn']) 
aircraft = groupby_venue_date['A/C std'].apply(''.join).rename('Aircraft-combined')
def sorted_string(s):
    return ''.join(sorted(s))
   
def remove_duplicate_chars(s):
    return ''.join(set(s))
   
  
#added lines from here
print("****************")
  
  
df=pd.DataFrame(aircraft)
#print(df) #original df
  
print("*****************")
  
indexNames = df[~df['Aircraft-combined'].str.contains('D',na=False)].index
df.drop(indexNames , inplace=True)
#print(df)#prints all rows having D and removes remaining rows
  
print("*****************")
  
df['Aircraft-combined']=df['Aircraft-combined'].replace(to_replace='SD', value='DS')
df['Aircraft-combined']=df['Aircraft-combined'].replace(to_replace=['HSD','HDS', 'SHSD'], value='DHS')
#print(df)#replaces the SD with DS and HSD,HDS and SHSD with DHS
  
print("*********************")
 
df = df.drop(df[df['Aircraft-combined'] == 'D'].index)#remove value that match letter D
df = df.drop(df[df['Aircraft-combined'] == 'DD'].index)
df = df.drop(df[df['Aircraft-combined'] == 'DL'].index)
df = df.drop(df[df['Aircraft-combined'] == 'LD'].index)
df = df.drop(df[df['Aircraft-combined'] == 'LDS'].index)
df = df.drop(df[df['Aircraft-combined'] == 'LSSD'].index)
#df = df.drop(df[df['Aircraft-combined'] != 'D'].index)#if need the rows that match only letter D in selected dataframe column
print(df) 
 
print("*********************")

#pd.DataFrame(aircraft)
And I get the following Output :-

Output:
Venue BID Date DISPLAY/ Durn ANNAN 606 No Date DISPLAY 00:05:00 DS COTTESMORE 757 No Date DISPLAY 00:05:00 DS EAST FORTUNE 406 No Date DISPLAY 00:10:00 DHS KEMBLE 316 2008-06-21 00:00:00 DISPLAY 00:15:00 DS No Date DISPLAY 00:20:00 DHS PORTRUSH 76 2008-09-05 00:00:00 DISPLAY 00:15:00 DHS 2008-09-06 00:00:00 DISPLAY 00:15:00 DHS SCARBOROUGH 610 2008-05-25 00:00:00 DISPLAY 00:05:00 DS SHAWBURY 188 2008-07-24 00:00:00 DISPLAY 00:05:00 DS SOUTHPORT 584 No Date DISPLAY 00:15:00 DHS TARRANT RUSHDEN 132 No Date DISPLAY 00:15:00 DHS WHITBY 125 2008-08-22 00:00:00 DISPLAY 00:15:00 DHS 2008-08-23 00:00:00 DISPLAY 00:15:00 DHS WIGTON 144 2008-07-26 00:00:00 DISPLAY 00:15:00 DHS
I want to now, have a new additional Column, called In The Same Display Slot. Because of the way, the data is collated, in the Source .xls File. Only the Venues, where the Aircraft are all in the same Display slot, Show here, I think because, when DAK and HS become DHS, as you will understand from the Code, and can see in the Original .XLS document when those values in the Duration Column i.e. Durn don't match i.e. 0:05 and 0:10, i.e. 5 and 10 minutes they don't show in the Output.

Only when both values, are the same do they show. I.e. when DHS are all in the same display slot, although for some reason for the Venue Woodstock, where both values are 0:15 The Row isn't showing, any ideas, why that is the case ?

The Woodstock Row, wasn't showing before either, can't work out why. All other Data Rows that are not showing now, but were before, with the others i.e :-

Output:
Venue BID Date DISPLAY/ ANNAN 606 No Date DISPLAY DS COTTESMORE 757 No Date DISPLAY DS EAST FORTUNE 406 No Date DISPLAY DHS GUERNSEY 582 2008-09-10 00:00:00 DISPLAY DHS JERSEY 581 2008-09-10 00:00:00 DISPLAY DHS KEMBLE 316 2008-06-21 00:00:00 DISPLAY DS No Date DISPLAY DHS PORTRUSH 76 2008-09-05 00:00:00 DISPLAY DHS 2008-09-06 00:00:00 DISPLAY DHS ROMSEY 175 No Date DISPLAY DHS SANICOLE 69 No Date DISPLAY DHS SCARBOROUGH 610 2008-05-25 00:00:00 DISPLAY DS SHAWBURY 188 2008-07-24 00:00:00 DISPLAY DS SOUTHPORT 584 No Date DISPLAY DHS SUNDERLAND 333 2008-07-25 00:00:00 DISPLAY DH No Date DISPLAY DHS TARRANT RUSHDEN 132 No Date DISPLAY DHS WHITBY 125 2008-08-22 00:00:00 DISPLAY DHS 2008-08-23 00:00:00 DISPLAY DHS WIGTON 144 2008-07-26 00:00:00 DISPLAY DHS WINDERMERE 138 2008-07-25 00:00:00 DISPLAY DHS 2008-07-26 00:00:00 DISPLAY DHS
Sanicole, Romsey Windermere etc, I would also like those Rows showing, with the new In The Same Display Slot Column, in addition to the ones in the first Output, which are the ones that are in the same display slot. So all those Rows Showing, but in the, "In The Same Display Slot" Column I would like all The Venue Rows, where they are in the same Display Slot saying YES and the ones that are not in the same display slot saying NO. Also I have noticed, that in the Date Column, it says No Date in the DataFrame Output. This is when the Date isn't present, in the 'Date' Column, in Rows Of Data in the .xls File. I.e. the Layout, of the Data in the .xls File being the issue.

Here is a Link to a Picture, of a sample of the .xls File Data. Is there a way, to have the Date populate, the empty Columns in the Rows with Data in, up to the right point in the Dataframe ?

Here is the link, to a Picture of what I mean :-

https://www.flickr.com/photos/[email protected]/

Kembles's DHS appearance on the 20th June, won't show a date in the DataFrame, as that Data is missing from that Row. The Date is higher up, but they have not included the Date for every Row, even though the relevant Data for the Date, is lower down. How could this be done for all Rows, giving the Correct Date ?

Any help would be appreciated

Best Wishes From

Eddie Winch
Reply
#13
I have tried to attempt, based on a Pandas Video on Youtube, I watched earlier. To achieve the Column, 'Dakota And Fighters In Same Slot' With Either YES or No, referring to part, of what I need help with in the Code, mentioned in my last reply.

Here is the Code I typed :-

df['Dakota And Fighters In The Same Display Slot'] = ((df.DISPLAY/ == 'DISPLAY') & (df.Durn == 5|10|15|20)).map({True:'YES', False:'NO'})
I get an Invalid Syntax Error, when I add that Code, to the rest of the Code, The Error Pointer Arrow, points to the '==' in the :-

((df.DISPLAY/ == 'DISPLAY')
part of the code, what is causing that issue ?

The Minutes Format is :-

00:05:00 for 5 Minutes, 00:10:00 for 10 Minutes etc, as
can be seen, in one of the Outputs, in my previous reply.

Could that be an issue too ?

Regards

Eddie
Reply
#14
I believe the syntax you given is correct only, but when we firstly had the join, group by and other processings done before assigning to dataframe initially, the column names alignment turned bad and which is why the syntax error. the dataframe thinking column name is only 'Aircraft-combined' and header0 as [Venue ,BID,Date,DISPLAY/,Durn] and remaining as data row values i guess.

Output:
#original dataframe head 3 rows print(df.head(n=3)) Aircraft-combined <<<<<<<<<<<<<<<< Venue BID Date DISPLAY/ Durn <<<<<<<<<<<<<< ANNAN 606 No Date DISPLAY 00:05:00 DS ARDINGLY(SUSSEX) 189 No Date DISPLAY 00:10:00 HS AUDLEM 706 2008-07-26 00:00:00 DISPLAY 00:05:00 S
I workarounded this way for now, export to csv and read again with column names using read_csv.
And, in regard to compare minutes of Durn column, convert the dataframe['Durn'] column to datetime64 and then compare as needed and revert datatype to object.

import pandas as pd
      
xls = pd.ExcelFile(r'C:\Users\testuser\Downloads\DL\Book12.xls')   
data = pd.read_excel(xls, sheet_name="Sheet1")      
pd.options.display.max_rows = 1000   
df = pd.DataFrame(data, columns= ['Venue','A/C','DISPLAY/','Date','BID','Durn'])
#df[(df['Venue'].str.contains('[a-zA-Z]') & (df['DISPLAY/'].str.contains('DISPLAY') & df['A/C'].str.contains("DHS|DAK|HS|SPIT")) & (df['A/C'] != 'LHS') & (df['A/C'] != 'LANC'))] 
df["Date"].fillna("No Date", inplace = True)   
df['A/C'].unique().tolist()  
rename_map = {
    'DAK': 'D',
    'SPIT': 'S',
    'LANC': 'L',
    'HURRI': 'H',
    'PARA': 'P'
}
df['A/C std'] = df['A/C'].replace(rename_map)
print(df['A/C std'].unique().tolist())
print("=======")  
selected = df.loc[df['DISPLAY/'] == 'DISPLAY']
#selected = df.loc[df['DISPLAY/'] == 'FLYPAST']
#selected = df.loc[df['DISPLAY/'].isin(['DISPLAY', 'FLYPAST'])]
#selected = df.loc[df['A/C'].isin(['DS', 'DH', 'DHS', 'SD', 'HSD'])]
groupby_venue_date = selected.groupby(['Venue', 'BID', 'Date', 'DISPLAY/', 'Durn']) 
aircraft = groupby_venue_date['A/C std'].apply(''.join).rename('Aircraft-combined')
def sorted_string(s):
    return ''.join(sorted(s))   
def remove_duplicate_chars(s):
    return ''.join(set(s))
 
 
#####################
#####################
#####################
#####################
 
 
df=pd.DataFrame(aircraft)
#print(df) #original df
   
  
indexNames = df[~df['Aircraft-combined'].str.contains('D',na=False)].index
df.drop(indexNames , inplace=True)
#print(df)#prints all rows having D and removes remaining rows
 
df['Aircraft-combined']=df['Aircraft-combined'].replace(to_replace='SD', value='DS')
df['Aircraft-combined']=df['Aircraft-combined'].replace(to_replace=['HSD','HDS', 'SHSD'], value='DHS')
#print(df)#replaces the SD with DS and HSD,HDS and SHSD with DHS
   

  
df = df.drop(df[df['Aircraft-combined'] == 'D'].index)#remove value that match letter D
df = df.drop(df[df['Aircraft-combined'] == 'DD'].index)
df = df.drop(df[df['Aircraft-combined'] == 'DL'].index)
df = df.drop(df[df['Aircraft-combined'] == 'LD'].index)
df = df.drop(df[df['Aircraft-combined'] == 'LDS'].index)
df = df.drop(df[df['Aircraft-combined'] == 'LSSD'].index)
#df = df.drop(df[df['Aircraft-combined'] != 'D'].index)#if need the rows that match only letter D in selected dataframe column



#column names before alignment
print("column names: before change  ", list(df.columns.values.tolist()))
print("=======")
print(df)
print("=======")  
 

#to fix column name alignment 
df.to_csv(r'C:\Users\testuser\Downloads\DL\tempfile.csv')
df = pd.read_csv(r'C:\Users\testuser\Downloads\DL\tempfile.csv',names=['Venue','BID','Date','DISPLAY/','Durn','Aircraft-combined'], header=0 )


#column names before alignment
print("column names:  after change ",list(df.columns.values.tolist()))
print("=======")
print(df)
print("=======")

#convert df['Durn'] column to datetime foramt
df['Durn']= pd.to_datetime(df['Durn'],format='%H:%M:%S').apply(pd.Timestamp)
print("changing datatype of column Durn----->",df['Durn'].dtype)
print("=======")
#compare minutes with extracting minutes from datetime64 column 
comparelist=[10,15,20]
df['Dakota And Fighters In The Same Display Slot'] = ((df['DISPLAY/'] == 'DISPLAY') & (df['Durn'].dt.minute.isin(comparelist))).map({True:'YES', False:'NO'})
#print(df)
print("applied... the condition to compare minutes for Durn column")
print("=======")


#return df['Durn'] column datetime64 format to object type
df['Durn'] = pd.to_datetime(df['Durn'], format='%H:%M:%S').dt.time
print("reverted datatype of column Durn back to ----->",df['Durn'].dtype)
print("=======")

print("\n\n*** FINAL RESULTSET ***\n\n")




print(df)    #final resultset
print("=======")
Output:
python test2.py [nan, 'L', 'S', 'H', 'LHS', 'D', 'HS', 'P', 'LSSD', 'LSS', 'SS', 'LH', 'DH', 'DHS', 'SSSHH'] ======= column names: before change ['Aircraft-combined'] ======= Aircraft-combined Venue BID Date DISPLAY/ Durn ANNAN 606 No Date DISPLAY 00:05:00 DS COTTESMORE 757 No Date DISPLAY 00:05:00 DS EAST FORTUNE 406 No Date DISPLAY 00:10:00 DHS KEMBLE 316 2008-06-21 00:00:00 DISPLAY 00:15:00 DS No Date DISPLAY 00:20:00 DHS PORTRUSH 76 2008-09-05 00:00:00 DISPLAY 00:15:00 DHS 2008-09-06 00:00:00 DISPLAY 00:15:00 DHS SCARBOROUGH 610 2008-05-25 00:00:00 DISPLAY 00:05:00 DS SHAWBURY 188 2008-07-24 00:00:00 DISPLAY 00:05:00 DS SOUTHPORT 584 No Date DISPLAY 00:15:00 DHS TARRANT RUSHDEN 132 No Date DISPLAY 00:15:00 DHS WHITBY 125 2008-08-22 00:00:00 DISPLAY 00:15:00 DHS 2008-08-23 00:00:00 DISPLAY 00:15:00 DHS WIGTON 144 2008-07-26 00:00:00 DISPLAY 00:15:00 DHS ======= column names: after change ['Venue', 'BID', 'Date', 'DISPLAY/', 'Durn', 'Aircraft-combined'] ======= Venue BID Date DISPLAY/ Durn Aircraft-combined 0 ANNAN 606 No Date DISPLAY 00:05:00 DS 1 COTTESMORE 757 No Date DISPLAY 00:05:00 DS 2 EAST FORTUNE 406 No Date DISPLAY 00:10:00 DHS 3 KEMBLE 316 2008-06-21 00:00:00 DISPLAY 00:15:00 DS 4 KEMBLE 316 No Date DISPLAY 00:20:00 DHS 5 PORTRUSH 76 2008-09-05 00:00:00 DISPLAY 00:15:00 DHS 6 PORTRUSH 76 2008-09-06 00:00:00 DISPLAY 00:15:00 DHS 7 SCARBOROUGH 610 2008-05-25 00:00:00 DISPLAY 00:05:00 DS 8 SHAWBURY 188 2008-07-24 00:00:00 DISPLAY 00:05:00 DS 9 SOUTHPORT 584 No Date DISPLAY 00:15:00 DHS 10 TARRANT RUSHDEN 132 No Date DISPLAY 00:15:00 DHS 11 WHITBY 125 2008-08-22 00:00:00 DISPLAY 00:15:00 DHS 12 WHITBY 125 2008-08-23 00:00:00 DISPLAY 00:15:00 DHS 13 WIGTON 144 2008-07-26 00:00:00 DISPLAY 00:15:00 DHS ======= changing datatype of column Durn-----> datetime64[ns] ======= applied... the condition to compare minutes for Durn column ======= reverted datatype of column Durn back to -----> object ======= *** FINAL RESULTSET *** Venue BID Date DISPLAY/ Durn Aircraft-combined Dakota And Fighters In The Same Display Slot 0 ANNAN 606 No Date DISPLAY 00:05:00 DS NO 1 COTTESMORE 757 No Date DISPLAY 00:05:00 DS NO 2 EAST FORTUNE 406 No Date DISPLAY 00:10:00 DHS YES 3 KEMBLE 316 2008-06-21 00:00:00 DISPLAY 00:15:00 DS YES 4 KEMBLE 316 No Date DISPLAY 00:20:00 DHS YES 5 PORTRUSH 76 2008-09-05 00:00:00 DISPLAY 00:15:00 DHS YES 6 PORTRUSH 76 2008-09-06 00:00:00 DISPLAY 00:15:00 DHS YES 7 SCARBOROUGH 610 2008-05-25 00:00:00 DISPLAY 00:05:00 DS NO 8 SHAWBURY 188 2008-07-24 00:00:00 DISPLAY 00:05:00 DS NO 9 SOUTHPORT 584 No Date DISPLAY 00:15:00 DHS YES 10 TARRANT RUSHDEN 132 No Date DISPLAY 00:15:00 DHS YES 11 WHITBY 125 2008-08-22 00:00:00 DISPLAY 00:15:00 DHS YES 12 WHITBY 125 2008-08-23 00:00:00 DISPLAY 00:15:00 DHS YES 13 WIGTON 144 2008-07-26 00:00:00 DISPLAY 00:15:00 DHS YES =======
Best Regards,
Sandeep

GANGA SANDEEP KUMAR
Reply
#15
Hi Sandeep,

Many thanks for your latest reply, to my post.

Could you please read my reply, before my last reply, i.e reply
number 12 on this page. And reply back accordingly, your help has been marvelous, it's very
much appreciated by me.

Regards

Eddie Winch
Reply
#16
Hi Sandeep,

Based on what you have read, in the 12th Post on this Page, aka mine,
could you help me, sort out what I want to achieve here ?

Have you installed, Jupyter Notebook yet, if you run the Python Codes especially mine, my posts will make more sense. Or could someone else, on this forum help me ?

Any help would be much appreciated.

Best Regards

Eddie Winch
Reply
#17
Can anyone help me ?

Regards

Eddie Winch
Reply
#18
Hi Sandeep,

I have sent you a private message, about this Thread, and the other Thread.

Best Regards

Eddie Winch
Reply
#19
Hi there,

This is the Python Code I have :-

import pandas as pd
       
xls = pd.ExcelFile(r'C:\Users\Edward\Desktop\BBMF Schedules And Master Forum Thread Texts\BBMF Display Schedule 2009.xls')   
data = pd.read_excel(xls, sheet_name="Sheet1")      
pd.options.display.max_rows = 1000   
pd.options.display.max_columns = 1000
df = pd.DataFrame(data, columns= ['Venue','A/C','DISPLAY/','Date','BID','Durn'])
#df[(df['Venue'].str.contains('[a-zA-Z]') & (df['DISPLAY/'].str.contains('DISPLAY') & df['A/C'].str.contains("DHS|DAK|HS|SPIT")) & (df['A/C'] != 'LHS') & (df['A/C'] != 'LANC'))] 
df["Date"].fillna("No Date", inplace = True)   
df['A/C'].unique().tolist()  
rename_map = {
    'DAK': 'D',
    'SPIT': 'S',
    'LANC': 'L',
    'HURRI': 'H',
    'PARA': 'P'
}
df['A/C std'] = df['A/C'].replace(rename_map)
print(df['A/C std'].unique().tolist())
print("=======")  
selected = df.loc[df['DISPLAY/'] == 'DISPLAY']
#selected = df.loc[df['DISPLAY/'] == 'FLYPAST']
#selected = df.loc[df['DISPLAY/'].isin(['DISPLAY', 'FLYPAST'])]
#selected = df.loc[df['A/C'].isin(['DS', 'DH', 'DHS', 'SD', 'HSD'])]
groupby_venue_date = selected.groupby(['Venue', 'BID', 'Date', 'DISPLAY/', 'Durn']) 
aircraft = groupby_venue_date['A/C std'].apply(''.join).rename('Aircraft-combined')
def sorted_string(s):
    return ''.join(sorted(s))   
def remove_duplicate_chars(s):
    return ''.join(set(s))
  
  
#####################
#####################
#####################
#####################
  
  
df=pd.DataFrame(aircraft)
#print(df) #original df
    
   
indexNames = df[~df['Aircraft-combined'].str.contains('D',na=False)].index
df.drop(indexNames , inplace=True)
#print(df)#prints all rows having D and removes remaining rows
  
df['Aircraft-combined']=df['Aircraft-combined'].replace(to_replace='SD', value='DS')
df['Aircraft-combined']=df['Aircraft-combined'].replace(to_replace=['HSD','HDS', 'SHSD'], value='DHS')
#print(df)#replaces the SD with DS and HSD,HDS and SHSD with DHS
    
 
   
df = df.drop(df[df['Aircraft-combined'] == 'D'].index)#remove value that match letter D
df = df.drop(df[df['Aircraft-combined'] == 'DD'].index)
df = df.drop(df[df['Aircraft-combined'] == 'DL'].index)
df = df.drop(df[df['Aircraft-combined'] == 'LD'].index)
df = df.drop(df[df['Aircraft-combined'] == 'LDS'].index)
df = df.drop(df[df['Aircraft-combined'] == 'LSSD'].index)
#df = df.drop(df[df['Aircraft-combined'] != 'D'].index)#if need the rows that match only letter D in selected dataframe column
 
 
 
#column names before alignment
#print("column names: before change  ", list(df.columns.values.tolist()))
#print("=======")
#print(df)
#print("=======")  
  
 
#to fix column name alignment 
df.to_csv(r'C:\Users\Edward\Desktop\BBMF Schedules And Master Forum Thread Texts\tempfile.csv')
df = pd.read_csv(r'C:\Users\Edward\Desktop\BBMF Schedules And Master Forum Thread Texts\tempfile.csv',names=['Venue','BID','Date','DISPLAY/','Durn','Aircraft-combined'], header=0 )
 
 
#column names before alignment
#print("column names:  after change ",list(df.columns.values.tolist()))
#print("=======")
#print(df)
#print("=======")
 
#convert df['Durn'] column to datetime foramt
df['Durn']= pd.to_datetime(df['Durn'],format='%H:%M:%S').apply(pd.Timestamp)
#print("changing datatype of column Durn----->",df['Durn'].dtype)
#print("=======")
#compare minutes with extracting minutes from datetime64 column 
comparelist=[5,10,15,20]
df['Dakota And Fighters In The Same Display Slot'] = ((df['DISPLAY/'] == 'DISPLAY') & (df['Durn'].dt.minute.isin(comparelist))).map({True:'YES', False:'NO'})
#print(df)
#print("applied... the condition to compare minutes for Durn column")
#print("=======")
 
 
#return df['Durn'] column datetime64 format to object type
df['Durn'] = pd.to_datetime(df['Durn'], format='%H:%M:%S').dt.time
#print("reverted datatype of column Durn back to ----->",df['Durn'].dtype)
#print("=======")
 
#print("\n\n*** FINAL RESULTSET ***\n\n")
 
pd.DataFrame(df)

#print(df)    #final resultset
#print("=======")
I would also Like the Rows showing, which don't show, due to none matching numerical values, and then have the Word 'NO' In the Dakota And Fighters In The Same Display Slot Column.

When the two Numerical Values, are 5 and 10 in the 'Durn' Column I.e. Display Slot Duration, or two other None matching Values, the Dakota and Fighters, are in different display slots. When they match, i.e. 15 and 15 they are in the same Display Slot. Those are the Venue Rows that show, in the current Output I get.

In the 'Venue' Column in the Excel Document, the Venue Name will occupy two Rows, and the Two Numbers in the 'Durn' Column i.e. :-

Output:
Venue Durn KEMBLE 15 KEMBLE 15
So the Row would show.

In The Case of WINDERMERE :-

Output:
Venue Durn WINDERMERE 5 WINDERMERE 10
The Numerical Values differ, so that Venue and others, with differing
Numerical Values, don't show.

Here is the Full Current Output I get :-

Output:
Venue BID Date DISPLAY/ Durn Aircraft-combined Dakota And Fighters In The Same Display Slot 0 ANNAN 606 No Date DISPLAY 00:05:00 DS YES 1 COTTESMORE 757 No Date DISPLAY 00:05:00 DS YES 2 EAST FORTUNE 406 No Date DISPLAY 00:10:00 DHS YES 3 KEMBLE 316 2008-06-21 00:00:00 DISPLAY 00:15:00 DS YES 4 KEMBLE 316 No Date DISPLAY 00:20:00 DHS YES 5 PORTRUSH 76 2008-09-05 00:00:00 DISPLAY 00:15:00 DHS YES 6 PORTRUSH 76 2008-09-06 00:00:00 DISPLAY 00:15:00 DHS YES 7 SCARBOROUGH 610 2008-05-25 00:00:00 DISPLAY 00:05:00 DS YES 8 SHAWBURY 188 2008-07-24 00:00:00 DISPLAY 00:05:00 DS YES 9 SOUTHPORT 584 No Date DISPLAY 00:15:00 DHS YES 10 TARRANT RUSHDEN 132 No Date DISPLAY 00:15:00 DHS YES 11 WHITBY 125 2008-08-22 00:00:00 DISPLAY 00:15:00 DHS YES 12 WHITBY 125 2008-08-23 00:00:00 DISPLAY 00:15:00 DHS YES 13 WIGTON 144 2008-07-26 00:00:00 DISPLAY 00:15:00 DHS YES
Could someone please sort out the Output Data, it's all unaligned ?

There are other Columns of course,

Could someone please tell me, coding I could type, to achieve that ?

Regards

Eddie Winch Smile

P.S.

Here is the Link, to the Excel .XLS File in Question :-

http://web.archive.org/web/2009080423493...6AA506.xls
Reply
#20
Hi there,

In the Python Code I have replaced :-

df["Date"].fillna("No Date", inplace = True)
With :-

df["Date"].fillna(method='ffill', inplace = True)
So now, the 'No Dates' showing in the 'Date' Column now have the correct Date, in the Date Format I want. As also, I have added the following two lines to the Code :-

df['Date']= pd.to_datetime(df['Date'],format='%Y-%m-%d')
df['Date']= pd.to_datetime(df['Date']).dt.strftime('%d-%m-%Y')
Which were modified, from the lines of Code, you kindly supplied today Sandeep, in another Thread of mine.

The Output I get now is :-

Output:
Venue BID Date DISPLAY/ Durn Aircraft-combined \ 0 ANNAN 606 04-07-2008 DISPLAY 00:05:00 DS 1 COTTESMORE 757 24-07-2008 DISPLAY 00:05:00 DS 2 EAST FORTUNE 406 25-07-2008 DISPLAY 00:10:00 DHS 3 KEMBLE 316 20-06-2008 DISPLAY 00:20:00 DHS 4 KEMBLE 316 21-06-2008 DISPLAY 00:15:00 DS 5 PORTRUSH 76 05-09-2008 DISPLAY 00:15:00 DHS 6 PORTRUSH 76 06-09-2008 DISPLAY 00:15:00 DHS 7 ROMSEY 175 31-08-2008 DISPLAY 00:15:00 DHS 8 SCARBOROUGH 610 25-05-2008 DISPLAY 00:05:00 DS 9 SHAWBURY 188 24-07-2008 DISPLAY 00:05:00 DS 10 SOUTHPORT 584 26-09-2008 DISPLAY 00:15:00 DHS 11 TARRANT RUSHDEN 132 04-06-2008 DISPLAY 00:15:00 DHS 12 WHITBY 125 22-08-2008 DISPLAY 00:15:00 DHS 13 WHITBY 125 23-08-2008 DISPLAY 00:15:00 DHS 14 WIGTON 144 26-07-2008 DISPLAY 00:15:00 DHS 15 WOODSTOCK 362 13-09-2008 DISPLAY 00:15:00 DHS
Output:
Dakota And Fighters In The Same Display Slot 0 YES 1 YES 2 YES 3 YES 4 YES 5 YES 6 YES 7 YES 8 YES 9 YES 10 YES 11 YES 12 YES 13 YES 14 YES 15 YES
How do I change all 2008 in the Date Column to 2009 ? And how do I sort the Data, so that the Rows, go descending, earliest Date to the latest Date, in the 'Date' Column ? And what will I need to change, so that the Rows not showing, due to the non matching values in the 'Durn' Column, do show and have NO showing in the 'Dakota And Fighters In The Same Display Slot' Column, for those Rows ? For the matching values, in comparelist=[5,10,15,20,] if they match they occupy both rows, the same Value, so 5+5 = 10, 10+10 = 20, 15+15 = 30, 20+20 = 40.

Whereas for those that don't match, i.e. 5 in one row and 10 in the other, 5+10 = 15. How would I write Code, to achieve what I am getting at ? If you look at the .xls File document, you will see what I mean, the Link to that .xls File, is in my previous Post, i.e. number 19. Any help would be appreciated by you Sandeep, or someone else.

Regards

Eddie Winch
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Copy Paste excel files based on the first letters of the file name Viento 2 423 Feb-07-2024, 12:24 PM
Last Post: Viento
  Unexpected Output - Python Dataframes: Filtering based on Overlapping Dates Xensor 5 702 Nov-15-2023, 06:54 PM
Last Post: deanhystad
  Search Excel File with a list of values huzzug 4 1,216 Nov-03-2023, 05:35 PM
Last Post: huzzug
  What data types can I use for default values? Mark17 1 522 Oct-09-2023, 02:07 PM
Last Post: buran
  Copy data from Excel and paste into Discord (Midjourney) Joe_Wright 4 2,034 Jun-06-2023, 05:49 PM
Last Post: rajeshgk
  restrict user input to numerical values MCL169 2 907 Apr-08-2023, 05:40 PM
Last Post: MCL169
  Reading data from excel file –> process it >>then write to another excel output file Jennifer_Jone 0 1,089 Mar-14-2023, 07:59 PM
Last Post: Jennifer_Jone
  Grouping Data based on 30% bracket purnima1 4 1,191 Mar-10-2023, 07:38 PM
Last Post: deanhystad
Question Inserting Numerical Value to the Element in Optionlist and Printing it into Entry drbilgehanbakirhan 1 802 Jan-30-2023, 05:16 AM
Last Post: deanhystad
  How to properly format rows and columns in excel data from parsed .txt blocks jh67 7 1,874 Dec-12-2022, 08:22 PM
Last Post: jh67

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020