Filtering Excel Document Data Based On Numerical Values

sandeep_ganga · (This post was last modified: Jan-03-2020, 07:26 PM by sandeep_ganga.)

try add below line as needed and see if that helps,

df = df.drop(df[df['Aircraft-combined'] == 'D'].index)#remove rows that match letter D in selected dataframe column
df = df.drop(df[df['Aircraft-combined'] != 'D'].index)#if need the rows that match only letter D in selected dataframe column

import pandas as pd
    
xls = pd.ExcelFile(r'C:\Users\testuser\Downloads\DL\book12.xls')
    
data = pd.read_excel(xls, sheet_name="Sheet1")
      
pd.options.display.max_rows = 1000
  
df = pd.DataFrame(data, columns= ['Venue','A/C','DISPLAY/','Date','BID'])
 
#df[(df['Venue'].str.contains('[a-zA-Z]') & (df['DISPLAY/'].str.contains('DISPLAY') & df['A/C'].str.contains("DHS|DAK|HS|SPIT")) & (df['A/C'] != 'LHS') & (df['A/C'] != 'LANC'))] 
df["Date"].fillna("No Date", inplace = True)
  
df['A/C'].unique().tolist()
  
rename_map = {
    'DAK': 'D',
    'SPIT': 'S',
    'LANC': 'L',
    'HURRI': 'H',
    'PARA': 'P'
}
df['A/C std'] = df['A/C'].replace(rename_map)
print(df['A/C std'].unique().tolist())
  
selected = df.loc[df['DISPLAY/'] == 'DISPLAY']
#selected = df.loc[df['A/C'].isin(['DS', 'DH', 'DHS', 'SD', 'HSD'])]
groupby_venue_date = selected.groupby(['Venue', 'BID', 'Date', 'DISPLAY/']) 
aircraft = groupby_venue_date['A/C std'].apply(''.join).rename('Aircraft-combined')
def sorted_string(s):
    return ''.join(sorted(s))
  
def remove_duplicate_chars(s):
    return ''.join(set(s))
  
 
#added lines from here
print("****************")
 
 
df=pd.DataFrame(aircraft)
#print(df) #original df
 
print("*****************")
 
indexNames = df[~df['Aircraft-combined'].str.contains('D',na=False)].index
df.drop(indexNames , inplace=True)
#print(df)#prints all rows having D and removes remaining rows
 
print("*****************")
 
df['Aircraft-combined']=df['Aircraft-combined'].replace(to_replace='SD', value='DS')
df['Aircraft-combined']=df['Aircraft-combined'].replace(to_replace=['HSD','HDS'], value='DHS')
#print(df)#replaces the SD with DS and HSD/HDS with DHS
 
print("*********************")

df = df.drop(df[df['Aircraft-combined'] == 'D'].index)#remove value that match letter D
#df = df.drop(df[df['Aircraft-combined'] != 'D'].index)#if need the rows that match only letter D in selected dataframe column
print(df) 

print("*********************")

Output:python test1.py

[nan, 'L', 'S', 'H', 'LHS', 'D', 'HS', 'P', 'LSSD', 'LSS', 'SS', 'LH', 'DH', 'DHS', 'SSSHH']
****************
*****************
*****************
*********************
                                                 Aircraft-combined
Venue           BID Date                DISPLAY/
ANNAN           606 No Date             DISPLAY                 DS
COTTESMORE      757 No Date             DISPLAY                 DS
EAST FORTUNE    406 No Date             DISPLAY                DHS
GUERNSEY        582 2008-09-10 00:00:00 DISPLAY                DHS
JERSEY          581 2008-09-10 00:00:00 DISPLAY                DHS
KEMBLE          316 2008-06-21 00:00:00 DISPLAY                 DS
                    No Date             DISPLAY                DHS
PORTRUSH        76  2008-09-05 00:00:00 DISPLAY                DHS
                    2008-09-06 00:00:00 DISPLAY                DHS
ROMSEY          175 No Date             DISPLAY                DHS
ROUGHAM         551 No Date             DISPLAY                 DL
SANICOLE        69  No Date             DISPLAY                DHS
SCARBOROUGH     610 2008-05-25 00:00:00 DISPLAY                 DS
SHAWBURY        188 2008-07-24 00:00:00 DISPLAY                 DS
SOUTHPORT       584 No Date             DISPLAY                DHS
SUNDERLAND      333 2008-07-25 00:00:00 DISPLAY                 DH
                    No Date             DISPLAY               SHSD
TARRANT RUSHDEN 132 No Date             DISPLAY                DHS
WHITBY          125 2008-08-22 00:00:00 DISPLAY                DHS
                    2008-08-23 00:00:00 DISPLAY                DHS
WIGTON          144 2008-07-26 00:00:00 DISPLAY                DHS
WINDERMERE      138 2008-07-25 00:00:00 DISPLAY                DHS
                    2008-07-26 00:00:00 DISPLAY                DHS
*********************

Best Regards,
Sandeep

GANGA SANDEEP KUMAR

eddywinch82 · (This post was last modified: Jan-05-2020, 01:37 AM by eddywinch82.)

Wow, thankyou so much Sandeep, for all your efforts, I very much appreciate your help.

How do I get that Data, shown in the Output part, of your last reply. Displayed in the Style and Font, the Data shown in the Photo is ? From the Website Link below ?

I.e. in that Font, with that same Data, rather than in the plane text ?

Is that possible ? i.e. in the proper DataFrame Font ?

Here is a link, to the picture of what I mean :-

https://www.flickr.com/photos/[email protected]/

Also what should be typed to change, all the 2008 part, of the dates that show, in the date column, to 2009 i.e. from

2008-07-24 to 2009-07-24. And all the dates, that show 1999 as the year, also changed to 2009 ? And how could they all be changed, to the order Day-Month-Year i.e. 24-07-2009 ?

I want that done, as the BBMF Schedule, is from 2009 not 2008.

And what should be typed in the Code, to get the Data, to display in the order it was in, in the original Excel File.

I say that because most if not all the Flypast appearances, don't show the date. So when the venues are shown, in alphabetical order, it makes it harder to know if the Flypast, was associated with a relevant display, i.e. when the aircraft were transiting to the display etc. If the Data is in the Original, non alphabetical order for the Venues, the relevant Flypasts should be underneath, I think, I may be wrong though.

To have data shown, for both Displays and Flypasts, I use the following Code :-

selected = df.loc[df['DISPLAY/'].isin(['DISPLAY', 'FLYPAST'])]

Update :-

My latest Code is the following :-

import pandas as pd
     
xls = pd.ExcelFile(r'C:\Users\Edward\Desktop\BBMF Schedules And Master Forum Thread Texts\BBMF Display Schedule 2009.xls')
     
data = pd.read_excel(xls, sheet_name="Sheet1")
       
pd.options.display.max_rows = 1000
   
df = pd.DataFrame(data, columns= ['Venue','A/C','DISPLAY/','Date','BID','Durn'])
  
#df[(df['Venue'].str.contains('[a-zA-Z]') & (df['DISPLAY/'].str.contains('DISPLAY') & df['A/C'].str.contains("DHS|DAK|HS|SPIT")) & (df['A/C'] != 'LHS') & (df['A/C'] != 'LANC'))] 
df["Date"].fillna("No Date", inplace = True)
   
df['A/C'].unique().tolist()
   
rename_map = {
    'DAK': 'D',
    'SPIT': 'S',
    'LANC': 'L',
    'HURRI': 'H',
    'PARA': 'P'
}
df['A/C std'] = df['A/C'].replace(rename_map)
print(df['A/C std'].unique().tolist())
   
selected = df.loc[df['DISPLAY/'] == 'DISPLAY']
#selected = df.loc[df['DISPLAY/'] == 'FLYPAST']
#selected = df.loc[df['DISPLAY/'].isin(['DISPLAY', 'FLYPAST'])]
#selected = df.loc[df['A/C'].isin(['DS', 'DH', 'DHS', 'SD', 'HSD'])]
groupby_venue_date = selected.groupby(['Venue', 'BID', 'Date', 'DISPLAY/', 'Durn']) 
aircraft = groupby_venue_date['A/C std'].apply(''.join).rename('Aircraft-combined')
def sorted_string(s):
    return ''.join(sorted(s))
   
def remove_duplicate_chars(s):
    return ''.join(set(s))
   
  
#added lines from here
print("****************")
  
  
df=pd.DataFrame(aircraft)
#print(df) #original df
  
print("*****************")
  
indexNames = df[~df['Aircraft-combined'].str.contains('D',na=False)].index
df.drop(indexNames , inplace=True)
#print(df)#prints all rows having D and removes remaining rows
  
print("*****************")
  
df['Aircraft-combined']=df['Aircraft-combined'].replace(to_replace='SD', value='DS')
df['Aircraft-combined']=df['Aircraft-combined'].replace(to_replace=['HSD','HDS', 'SHSD'], value='DHS')
#print(df)#replaces the SD with DS and HSD,HDS and SHSD with DHS
  
print("*********************")
 
df = df.drop(df[df['Aircraft-combined'] == 'D'].index)#remove value that match letter D
df = df.drop(df[df['Aircraft-combined'] == 'DD'].index)
df = df.drop(df[df['Aircraft-combined'] == 'DL'].index)
df = df.drop(df[df['Aircraft-combined'] == 'LD'].index)
df = df.drop(df[df['Aircraft-combined'] == 'LDS'].index)
df = df.drop(df[df['Aircraft-combined'] == 'LSSD'].index)
#df = df.drop(df[df['Aircraft-combined'] != 'D'].index)#if need the rows that match only letter D in selected dataframe column
print(df) 
 
print("*********************")

#pd.DataFrame(aircraft)

And I get the following Output :-

Output:Venue           BID Date                DISPLAY/ Durn                      
ANNAN           606 No Date             DISPLAY  00:05:00                DS
COTTESMORE      757 No Date             DISPLAY  00:05:00                DS
EAST FORTUNE    406 No Date             DISPLAY  00:10:00               DHS
KEMBLE          316 2008-06-21 00:00:00 DISPLAY  00:15:00                DS
                    No Date             DISPLAY  00:20:00               DHS
PORTRUSH        76  2008-09-05 00:00:00 DISPLAY  00:15:00               DHS
                    2008-09-06 00:00:00 DISPLAY  00:15:00               DHS
SCARBOROUGH     610 2008-05-25 00:00:00 DISPLAY  00:05:00                DS
SHAWBURY        188 2008-07-24 00:00:00 DISPLAY  00:05:00                DS
SOUTHPORT       584 No Date             DISPLAY  00:15:00               DHS
TARRANT RUSHDEN 132 No Date             DISPLAY  00:15:00               DHS
WHITBY          125 2008-08-22 00:00:00 DISPLAY  00:15:00               DHS
                    2008-08-23 00:00:00 DISPLAY  00:15:00               DHS
WIGTON          144 2008-07-26 00:00:00 DISPLAY  00:15:00               DHS

I want to now, have a new additional Column, called In The Same Display Slot. Because of the way, the data is collated, in the Source .xls File. Only the Venues, where the Aircraft are all in the same Display slot, Show here, I think because, when DAK and HS become DHS, as you will understand from the Code, and can see in the Original .XLS document when those values in the Duration Column i.e. Durn don't match i.e. 0:05 and 0:10, i.e. 5 and 10 minutes they don't show in the Output.

Only when both values, are the same do they show. I.e. when DHS are all in the same display slot, although for some reason for the Venue Woodstock, where both values are 0:15 The Row isn't showing, any ideas, why that is the case ?

The Woodstock Row, wasn't showing before either, can't work out why. All other Data Rows that are not showing now, but were before, with the others i.e :-

Output:Venue           BID Date                DISPLAY/                  
ANNAN           606 No Date             DISPLAY                 DS
COTTESMORE      757 No Date             DISPLAY                 DS
EAST FORTUNE    406 No Date             DISPLAY                DHS
GUERNSEY        582 2008-09-10 00:00:00 DISPLAY                DHS
JERSEY          581 2008-09-10 00:00:00 DISPLAY                DHS
KEMBLE          316 2008-06-21 00:00:00 DISPLAY                 DS
                    No Date             DISPLAY                DHS
PORTRUSH        76  2008-09-05 00:00:00 DISPLAY                DHS
                    2008-09-06 00:00:00 DISPLAY                DHS
ROMSEY          175 No Date             DISPLAY                DHS
SANICOLE        69  No Date             DISPLAY                DHS
SCARBOROUGH     610 2008-05-25 00:00:00 DISPLAY                 DS
SHAWBURY        188 2008-07-24 00:00:00 DISPLAY                 DS
SOUTHPORT       584 No Date             DISPLAY                DHS
SUNDERLAND      333 2008-07-25 00:00:00 DISPLAY                 DH
                    No Date             DISPLAY                DHS
TARRANT RUSHDEN 132 No Date             DISPLAY                DHS
WHITBY          125 2008-08-22 00:00:00 DISPLAY                DHS
                    2008-08-23 00:00:00 DISPLAY                DHS
WIGTON          144 2008-07-26 00:00:00 DISPLAY                DHS
WINDERMERE      138 2008-07-25 00:00:00 DISPLAY                DHS
                    2008-07-26 00:00:00 DISPLAY                DHS

Sanicole, Romsey Windermere etc, I would also like those Rows showing, with the new In The Same Display Slot Column, in addition to the ones in the first Output, which are the ones that are in the same display slot. So all those Rows Showing, but in the, "In The Same Display Slot" Column I would like all The Venue Rows, where they are in the same Display Slot saying YES and the ones that are not in the same display slot saying NO. Also I have noticed, that in the Date Column, it says No Date in the DataFrame Output. This is when the Date isn't present, in the 'Date' Column, in Rows Of Data in the .xls File. I.e. the Layout, of the Data in the .xls File being the issue.

Here is a Link to a Picture, of a sample of the .xls File Data. Is there a way, to have the Date populate, the empty Columns in the Rows with Data in, up to the right point in the Dataframe ?

Here is the link, to a Picture of what I mean :-

https://www.flickr.com/photos/[email protected]/

Kembles's DHS appearance on the 20th June, won't show a date in the DataFrame, as that Data is missing from that Row. The Date is higher up, but they have not included the Date for every Row, even though the relevant Data for the Date, is lower down. How could this be done for all Rows, giving the Correct Date ?

Any help would be appreciated

Best Wishes From

Eddie Winch

eddywinch82 · (This post was last modified: Jan-05-2020, 11:11 PM by eddywinch82.)

I have tried to attempt, based on a Pandas Video on Youtube, I watched earlier. To achieve the Column, 'Dakota And Fighters In Same Slot' With Either YES or No, referring to part, of what I need help with in the Code, mentioned in my last reply.

Here is the Code I typed :-

df['Dakota And Fighters In The Same Display Slot'] = ((df.DISPLAY/ == 'DISPLAY') & (df.Durn == 5|10|15|20)).map({True:'YES', False:'NO'})

I get an Invalid Syntax Error, when I add that Code, to the rest of the Code, The Error Pointer Arrow, points to the '==' in the :-

((df.DISPLAY/ == 'DISPLAY')

part of the code, what is causing that issue ?

The Minutes Format is :-

00:05:00 for 5 Minutes, 00:10:00 for 10 Minutes etc, as
can be seen, in one of the Outputs, in my previous reply.

Could that be an issue too ?

Regards

Eddie

sandeep_ganga · (This post was last modified: Jan-06-2020, 05:57 AM by sandeep_ganga.)

I believe the syntax you given is correct only, but when we firstly had the join, group by and other processings done before assigning to dataframe initially, the column names alignment turned bad and which is why the syntax error. the dataframe thinking column name is only 'Aircraft-combined' and header0 as [Venue ,BID,Date,DISPLAY/,Durn] and remaining as data row values i guess.

Output:#original dataframe head 3 rows
print(df.head(n=3))

                                                           Aircraft-combined   <<<<<<<<<<<<<<<<
Venue            BID Date                DISPLAY/ Durn                         <<<<<<<<<<<<<<
ANNAN            606 No Date             DISPLAY  00:05:00                DS
ARDINGLY(SUSSEX) 189 No Date             DISPLAY  00:10:00                HS
AUDLEM           706 2008-07-26 00:00:00 DISPLAY  00:05:00                 S

I workarounded this way for now, export to csv and read again with column names using read_csv.
And, in regard to compare minutes of Durn column, convert the dataframe['Durn'] column to datetime64 and then compare as needed and revert datatype to object.

import pandas as pd
      
xls = pd.ExcelFile(r'C:\Users\testuser\Downloads\DL\Book12.xls')   
data = pd.read_excel(xls, sheet_name="Sheet1")      
pd.options.display.max_rows = 1000   
df = pd.DataFrame(data, columns= ['Venue','A/C','DISPLAY/','Date','BID','Durn'])
#df[(df['Venue'].str.contains('[a-zA-Z]') & (df['DISPLAY/'].str.contains('DISPLAY') & df['A/C'].str.contains("DHS|DAK|HS|SPIT")) & (df['A/C'] != 'LHS') & (df['A/C'] != 'LANC'))] 
df["Date"].fillna("No Date", inplace = True)   
df['A/C'].unique().tolist()  
rename_map = {
    'DAK': 'D',
    'SPIT': 'S',
    'LANC': 'L',
    'HURRI': 'H',
    'PARA': 'P'
}
df['A/C std'] = df['A/C'].replace(rename_map)
print(df['A/C std'].unique().tolist())
print("=======")  
selected = df.loc[df['DISPLAY/'] == 'DISPLAY']
#selected = df.loc[df['DISPLAY/'] == 'FLYPAST']
#selected = df.loc[df['DISPLAY/'].isin(['DISPLAY', 'FLYPAST'])]
#selected = df.loc[df['A/C'].isin(['DS', 'DH', 'DHS', 'SD', 'HSD'])]
groupby_venue_date = selected.groupby(['Venue', 'BID', 'Date', 'DISPLAY/', 'Durn']) 
aircraft = groupby_venue_date['A/C std'].apply(''.join).rename('Aircraft-combined')
def sorted_string(s):
    return ''.join(sorted(s))   
def remove_duplicate_chars(s):
    return ''.join(set(s))
 
 
#####################
#####################
#####################
#####################
 
 
df=pd.DataFrame(aircraft)
#print(df) #original df
   
  
indexNames = df[~df['Aircraft-combined'].str.contains('D',na=False)].index
df.drop(indexNames , inplace=True)
#print(df)#prints all rows having D and removes remaining rows
 
df['Aircraft-combined']=df['Aircraft-combined'].replace(to_replace='SD', value='DS')
df['Aircraft-combined']=df['Aircraft-combined'].replace(to_replace=['HSD','HDS', 'SHSD'], value='DHS')
#print(df)#replaces the SD with DS and HSD,HDS and SHSD with DHS
   

  
df = df.drop(df[df['Aircraft-combined'] == 'D'].index)#remove value that match letter D
df = df.drop(df[df['Aircraft-combined'] == 'DD'].index)
df = df.drop(df[df['Aircraft-combined'] == 'DL'].index)
df = df.drop(df[df['Aircraft-combined'] == 'LD'].index)
df = df.drop(df[df['Aircraft-combined'] == 'LDS'].index)
df = df.drop(df[df['Aircraft-combined'] == 'LSSD'].index)
#df = df.drop(df[df['Aircraft-combined'] != 'D'].index)#if need the rows that match only letter D in selected dataframe column



#column names before alignment
print("column names: before change  ", list(df.columns.values.tolist()))
print("=======")
print(df)
print("=======")  
 

#to fix column name alignment 
df.to_csv(r'C:\Users\testuser\Downloads\DL\tempfile.csv')
df = pd.read_csv(r'C:\Users\testuser\Downloads\DL\tempfile.csv',names=['Venue','BID','Date','DISPLAY/','Durn','Aircraft-combined'], header=0 )


#column names before alignment
print("column names:  after change ",list(df.columns.values.tolist()))
print("=======")
print(df)
print("=======")

#convert df['Durn'] column to datetime foramt
df['Durn']= pd.to_datetime(df['Durn'],format='%H:%M:%S').apply(pd.Timestamp)
print("changing datatype of column Durn----->",df['Durn'].dtype)
print("=======")
#compare minutes with extracting minutes from datetime64 column 
comparelist=[10,15,20]
df['Dakota And Fighters In The Same Display Slot'] = ((df['DISPLAY/'] == 'DISPLAY') & (df['Durn'].dt.minute.isin(comparelist))).map({True:'YES', False:'NO'})
#print(df)
print("applied... the condition to compare minutes for Durn column")
print("=======")


#return df['Durn'] column datetime64 format to object type
df['Durn'] = pd.to_datetime(df['Durn'], format='%H:%M:%S').dt.time
print("reverted datatype of column Durn back to ----->",df['Durn'].dtype)
print("=======")

print("\n\n*** FINAL RESULTSET ***\n\n")




print(df)    #final resultset
print("=======")

Output:python test2.py



[nan, 'L', 'S', 'H', 'LHS', 'D', 'HS', 'P', 'LSSD', 'LSS', 'SS', 'LH', 'DH', 'DHS', 'SSSHH']
=======
column names: before change   ['Aircraft-combined']
=======
                                                          Aircraft-combined
Venue           BID Date                DISPLAY/ Durn
ANNAN           606 No Date             DISPLAY  00:05:00                DS
COTTESMORE      757 No Date             DISPLAY  00:05:00                DS
EAST FORTUNE    406 No Date             DISPLAY  00:10:00               DHS
KEMBLE          316 2008-06-21 00:00:00 DISPLAY  00:15:00                DS
                    No Date             DISPLAY  00:20:00               DHS
PORTRUSH        76  2008-09-05 00:00:00 DISPLAY  00:15:00               DHS
                    2008-09-06 00:00:00 DISPLAY  00:15:00               DHS
SCARBOROUGH     610 2008-05-25 00:00:00 DISPLAY  00:05:00                DS
SHAWBURY        188 2008-07-24 00:00:00 DISPLAY  00:05:00                DS
SOUTHPORT       584 No Date             DISPLAY  00:15:00               DHS
TARRANT RUSHDEN 132 No Date             DISPLAY  00:15:00               DHS
WHITBY          125 2008-08-22 00:00:00 DISPLAY  00:15:00               DHS
                    2008-08-23 00:00:00 DISPLAY  00:15:00               DHS
WIGTON          144 2008-07-26 00:00:00 DISPLAY  00:15:00               DHS
=======
column names:  after change  ['Venue', 'BID', 'Date', 'DISPLAY/', 'Durn', 'Aircraft-combined']
=======
              Venue  BID                 Date DISPLAY/      Durn Aircraft-combined
0             ANNAN  606              No Date  DISPLAY  00:05:00                DS
1        COTTESMORE  757              No Date  DISPLAY  00:05:00                DS
2      EAST FORTUNE  406              No Date  DISPLAY  00:10:00               DHS
3            KEMBLE  316  2008-06-21 00:00:00  DISPLAY  00:15:00                DS
4            KEMBLE  316              No Date  DISPLAY  00:20:00               DHS
5          PORTRUSH   76  2008-09-05 00:00:00  DISPLAY  00:15:00               DHS
6          PORTRUSH   76  2008-09-06 00:00:00  DISPLAY  00:15:00               DHS
7       SCARBOROUGH  610  2008-05-25 00:00:00  DISPLAY  00:05:00                DS
8          SHAWBURY  188  2008-07-24 00:00:00  DISPLAY  00:05:00                DS
9         SOUTHPORT  584              No Date  DISPLAY  00:15:00               DHS
10  TARRANT RUSHDEN  132              No Date  DISPLAY  00:15:00               DHS
11           WHITBY  125  2008-08-22 00:00:00  DISPLAY  00:15:00               DHS
12           WHITBY  125  2008-08-23 00:00:00  DISPLAY  00:15:00               DHS
13           WIGTON  144  2008-07-26 00:00:00  DISPLAY  00:15:00               DHS
=======
changing datatype of column Durn-----> datetime64[ns]
=======
applied... the condition to compare minutes for Durn column
=======
reverted datatype of column Durn back to -----> object
=======


*** FINAL RESULTSET ***


              Venue  BID                 Date DISPLAY/      Durn Aircraft-combined Dakota And Fighters In The Same Display Slot
0             ANNAN  606              No Date  DISPLAY  00:05:00                DS                                           NO
1        COTTESMORE  757              No Date  DISPLAY  00:05:00                DS                                           NO
2      EAST FORTUNE  406              No Date  DISPLAY  00:10:00               DHS                                          YES
3            KEMBLE  316  2008-06-21 00:00:00  DISPLAY  00:15:00                DS                                          YES
4            KEMBLE  316              No Date  DISPLAY  00:20:00               DHS                                          YES
5          PORTRUSH   76  2008-09-05 00:00:00  DISPLAY  00:15:00               DHS                                          YES
6          PORTRUSH   76  2008-09-06 00:00:00  DISPLAY  00:15:00               DHS                                          YES
7       SCARBOROUGH  610  2008-05-25 00:00:00  DISPLAY  00:05:00                DS                                           NO
8          SHAWBURY  188  2008-07-24 00:00:00  DISPLAY  00:05:00                DS                                           NO
9         SOUTHPORT  584              No Date  DISPLAY  00:15:00               DHS                                          YES
10  TARRANT RUSHDEN  132              No Date  DISPLAY  00:15:00               DHS                                          YES
11           WHITBY  125  2008-08-22 00:00:00  DISPLAY  00:15:00               DHS                                          YES
12           WHITBY  125  2008-08-23 00:00:00  DISPLAY  00:15:00               DHS                                          YES
13           WIGTON  144  2008-07-26 00:00:00  DISPLAY  00:15:00               DHS                                          YES
=======

Best Regards,
Sandeep

GANGA SANDEEP KUMAR

eddywinch82 · Jan-06-2020, 11:42 AM

Hi Sandeep,

Many thanks for your latest reply, to my post.

Could you please read my reply, before my last reply, i.e reply
number 12 on this page. And reply back accordingly, your help has been marvelous, it's very
much appreciated by me.

Regards

Eddie Winch

eddywinch82 · (This post was last modified: Jan-11-2020, 02:57 PM by eddywinch82.)

Hi Sandeep,

Based on what you have read, in the 12th Post on this Page, aka mine,
could you help me, sort out what I want to achieve here ?

Have you installed, Jupyter Notebook yet, if you run the Python Codes especially mine, my posts will make more sense. Or could someone else, on this forum help me ?

Any help would be much appreciated.

Best Regards

Eddie Winch

eddywinch82 · Jan-17-2020, 05:13 PM

Can anyone help me ?

Regards

Eddie Winch

eddywinch82 · Jan-22-2020, 09:10 AM

Hi Sandeep,

I have sent you a private message, about this Thread, and the other Thread.

Best Regards

Eddie Winch

eddywinch82 · (This post was last modified: Jan-25-2020, 09:59 PM by eddywinch82.)

Hi there,

This is the Python Code I have :-

import pandas as pd
       
xls = pd.ExcelFile(r'C:\Users\Edward\Desktop\BBMF Schedules And Master Forum Thread Texts\BBMF Display Schedule 2009.xls')   
data = pd.read_excel(xls, sheet_name="Sheet1")      
pd.options.display.max_rows = 1000   
pd.options.display.max_columns = 1000
df = pd.DataFrame(data, columns= ['Venue','A/C','DISPLAY/','Date','BID','Durn'])
#df[(df['Venue'].str.contains('[a-zA-Z]') & (df['DISPLAY/'].str.contains('DISPLAY') & df['A/C'].str.contains("DHS|DAK|HS|SPIT")) & (df['A/C'] != 'LHS') & (df['A/C'] != 'LANC'))] 
df["Date"].fillna("No Date", inplace = True)   
df['A/C'].unique().tolist()  
rename_map = {
    'DAK': 'D',
    'SPIT': 'S',
    'LANC': 'L',
    'HURRI': 'H',
    'PARA': 'P'
}
df['A/C std'] = df['A/C'].replace(rename_map)
print(df['A/C std'].unique().tolist())
print("=======")  
selected = df.loc[df['DISPLAY/'] == 'DISPLAY']
#selected = df.loc[df['DISPLAY/'] == 'FLYPAST']
#selected = df.loc[df['DISPLAY/'].isin(['DISPLAY', 'FLYPAST'])]
#selected = df.loc[df['A/C'].isin(['DS', 'DH', 'DHS', 'SD', 'HSD'])]
groupby_venue_date = selected.groupby(['Venue', 'BID', 'Date', 'DISPLAY/', 'Durn']) 
aircraft = groupby_venue_date['A/C std'].apply(''.join).rename('Aircraft-combined')
def sorted_string(s):
    return ''.join(sorted(s))   
def remove_duplicate_chars(s):
    return ''.join(set(s))
  
  
#####################
#####################
#####################
#####################
  
  
df=pd.DataFrame(aircraft)
#print(df) #original df
    
   
indexNames = df[~df['Aircraft-combined'].str.contains('D',na=False)].index
df.drop(indexNames , inplace=True)
#print(df)#prints all rows having D and removes remaining rows
  
df['Aircraft-combined']=df['Aircraft-combined'].replace(to_replace='SD', value='DS')
df['Aircraft-combined']=df['Aircraft-combined'].replace(to_replace=['HSD','HDS', 'SHSD'], value='DHS')
#print(df)#replaces the SD with DS and HSD,HDS and SHSD with DHS
    
 
   
df = df.drop(df[df['Aircraft-combined'] == 'D'].index)#remove value that match letter D
df = df.drop(df[df['Aircraft-combined'] == 'DD'].index)
df = df.drop(df[df['Aircraft-combined'] == 'DL'].index)
df = df.drop(df[df['Aircraft-combined'] == 'LD'].index)
df = df.drop(df[df['Aircraft-combined'] == 'LDS'].index)
df = df.drop(df[df['Aircraft-combined'] == 'LSSD'].index)
#df = df.drop(df[df['Aircraft-combined'] != 'D'].index)#if need the rows that match only letter D in selected dataframe column
 
 
 
#column names before alignment
#print("column names: before change  ", list(df.columns.values.tolist()))
#print("=======")
#print(df)
#print("=======")  
  
 
#to fix column name alignment 
df.to_csv(r'C:\Users\Edward\Desktop\BBMF Schedules And Master Forum Thread Texts\tempfile.csv')
df = pd.read_csv(r'C:\Users\Edward\Desktop\BBMF Schedules And Master Forum Thread Texts\tempfile.csv',names=['Venue','BID','Date','DISPLAY/','Durn','Aircraft-combined'], header=0 )
 
 
#column names before alignment
#print("column names:  after change ",list(df.columns.values.tolist()))
#print("=======")
#print(df)
#print("=======")
 
#convert df['Durn'] column to datetime foramt
df['Durn']= pd.to_datetime(df['Durn'],format='%H:%M:%S').apply(pd.Timestamp)
#print("changing datatype of column Durn----->",df['Durn'].dtype)
#print("=======")
#compare minutes with extracting minutes from datetime64 column 
comparelist=[5,10,15,20]
df['Dakota And Fighters In The Same Display Slot'] = ((df['DISPLAY/'] == 'DISPLAY') & (df['Durn'].dt.minute.isin(comparelist))).map({True:'YES', False:'NO'})
#print(df)
#print("applied... the condition to compare minutes for Durn column")
#print("=======")
 
 
#return df['Durn'] column datetime64 format to object type
df['Durn'] = pd.to_datetime(df['Durn'], format='%H:%M:%S').dt.time
#print("reverted datatype of column Durn back to ----->",df['Durn'].dtype)
#print("=======")
 
#print("\n\n*** FINAL RESULTSET ***\n\n")
 
pd.DataFrame(df)

#print(df)    #final resultset
#print("=======")

I would also Like the Rows showing, which don't show, due to none matching numerical values, and then have the Word 'NO' In the Dakota And Fighters In The Same Display Slot Column.

When the two Numerical Values, are 5 and 10 in the 'Durn' Column I.e. Display Slot Duration, or two other None matching Values, the Dakota and Fighters, are in different display slots. When they match, i.e. 15 and 15 they are in the same Display Slot. Those are the Venue Rows that show, in the current Output I get.

In the 'Venue' Column in the Excel Document, the Venue Name will occupy two Rows, and the Two Numbers in the 'Durn' Column i.e. :-

Output:Venue       Durn

KEMBLE       15
KEMBLE       15

So the Row would show.

In The Case of WINDERMERE :-

Output:Venue          Durn

WINDERMERE      5
WINDERMERE      10

The Numerical Values differ, so that Venue and others, with differing
Numerical Values, don't show.

Here is the Full Current Output I get :-

Output:Venue  BID                 Date DISPLAY/      Durn Aircraft-combined Dakota And Fighters In The Same Display Slot
0             ANNAN  606              No Date  DISPLAY  00:05:00                DS                                          YES
1        COTTESMORE  757              No Date  DISPLAY  00:05:00                DS                                          YES
2      EAST FORTUNE  406              No Date  DISPLAY  00:10:00               DHS                                          YES
3            KEMBLE  316  2008-06-21 00:00:00  DISPLAY  00:15:00                DS                                          YES
4            KEMBLE  316              No Date  DISPLAY  00:20:00               DHS                                          YES
5          PORTRUSH   76  2008-09-05 00:00:00  DISPLAY  00:15:00               DHS                                          YES
6          PORTRUSH   76  2008-09-06 00:00:00  DISPLAY  00:15:00               DHS                                          YES
7       SCARBOROUGH  610  2008-05-25 00:00:00  DISPLAY  00:05:00                DS                                          YES
8          SHAWBURY  188  2008-07-24 00:00:00  DISPLAY  00:05:00                DS                                          YES
9         SOUTHPORT  584              No Date  DISPLAY  00:15:00               DHS                                          YES
10  TARRANT RUSHDEN  132              No Date  DISPLAY  00:15:00               DHS                                          YES
11           WHITBY  125  2008-08-22 00:00:00  DISPLAY  00:15:00               DHS                                          YES
12           WHITBY  125  2008-08-23 00:00:00  DISPLAY  00:15:00               DHS                                          YES
13           WIGTON  144  2008-07-26 00:00:00  DISPLAY  00:15:00               DHS                                          YES

Could someone please sort out the Output Data, it's all unaligned ?

There are other Columns of course,

Could someone please tell me, coding I could type, to achieve that ?

Regards

Eddie Winch Smile

P.S.

Here is the Link, to the Excel .XLS File in Question :-

http://web.archive.org/web/2009080423493...6AA506.xls

eddywinch82 · (This post was last modified: Jan-27-2020, 08:11 PM by eddywinch82.)

Hi there,

In the Python Code I have replaced :-

df["Date"].fillna("No Date", inplace = True)

With :-

df["Date"].fillna(method='ffill', inplace = True)

So now, the 'No Dates' showing in the 'Date' Column now have the correct Date, in the Date Format I want. As also, I have added the following two lines to the Code :-

df['Date']= pd.to_datetime(df['Date'],format='%Y-%m-%d')
df['Date']= pd.to_datetime(df['Date']).dt.strftime('%d-%m-%Y')

Which were modified, from the lines of Code, you kindly supplied today Sandeep, in another Thread of mine.

The Output I get now is :-

Output:           Venue     BID        Date DISPLAY/      Durn Aircraft-combined  \
0             ANNAN  606  04-07-2008  DISPLAY  00:05:00                DS   
1        COTTESMORE  757  24-07-2008  DISPLAY  00:05:00                DS   
2      EAST FORTUNE  406  25-07-2008  DISPLAY  00:10:00               DHS   
3            KEMBLE  316  20-06-2008  DISPLAY  00:20:00               DHS   
4            KEMBLE  316  21-06-2008  DISPLAY  00:15:00                DS   
5          PORTRUSH   76  05-09-2008  DISPLAY  00:15:00               DHS   
6          PORTRUSH   76  06-09-2008  DISPLAY  00:15:00               DHS   
7            ROMSEY  175  31-08-2008  DISPLAY  00:15:00               DHS   
8       SCARBOROUGH  610  25-05-2008  DISPLAY  00:05:00                DS   
9          SHAWBURY  188  24-07-2008  DISPLAY  00:05:00                DS   
10        SOUTHPORT  584  26-09-2008  DISPLAY  00:15:00               DHS   
11  TARRANT RUSHDEN  132  04-06-2008  DISPLAY  00:15:00               DHS   
12           WHITBY  125  22-08-2008  DISPLAY  00:15:00               DHS   
13           WHITBY  125  23-08-2008  DISPLAY  00:15:00               DHS   
14           WIGTON  144  26-07-2008  DISPLAY  00:15:00               DHS   
15        WOODSTOCK  362  13-09-2008  DISPLAY  00:15:00               DHS

Output:Dakota And Fighters In The Same Display Slot  
0                                           YES  
1                                           YES  
2                                           YES  
3                                           YES  
4                                           YES  
5                                           YES  
6                                           YES  
7                                           YES  
8                                           YES  
9                                           YES  
10                                          YES  
11                                          YES  
12                                          YES  
13                                          YES  
14                                          YES  
15                                          YES

How do I change all 2008 in the Date Column to 2009 ? And how do I sort the Data, so that the Rows, go descending, earliest Date to the latest Date, in the 'Date' Column ? And what will I need to change, so that the Rows not showing, due to the non matching values in the 'Durn' Column, do show and have NO showing in the 'Dakota And Fighters In The Same Display Slot' Column, for those Rows ? For the matching values, in comparelist=[5,10,15,20,] if they match they occupy both rows, the same Value, so 5+5 = 10, 10+10 = 20, 15+15 = 30, 20+20 = 40.

Whereas for those that don't match, i.e. 5 in one row and 10 in the other, 5+10 = 15. How would I write Code, to achieve what I am getting at ? If you look at the .xls File document, you will see what I mean, the Link to that .xls File, is in my previous Post, i.e. number 19. Any help would be appreciated by you Sandeep, or someone else.

Regards

Eddie Winch

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Copy Paste excel files based on the first letters of the file name	Viento	2	423	Feb-07-2024, 12:24 PM Last Post: Viento
	Unexpected Output - Python Dataframes: Filtering based on Overlapping Dates	Xensor	5	702	Nov-15-2023, 06:54 PM Last Post: deanhystad
	Search Excel File with a list of values	huzzug	4	1,216	Nov-03-2023, 05:35 PM Last Post: huzzug
	What data types can I use for default values?	Mark17	1	522	Oct-09-2023, 02:07 PM Last Post: buran
	Copy data from Excel and paste into Discord (Midjourney)	Joe_Wright	4	2,034	Jun-06-2023, 05:49 PM Last Post: rajeshgk
	restrict user input to numerical values	MCL169	2	907	Apr-08-2023, 05:40 PM Last Post: MCL169
	Reading data from excel file –> process it >>then write to another excel output file	Jennifer_Jone	0	1,089	Mar-14-2023, 07:59 PM Last Post: Jennifer_Jone
	Grouping Data based on 30% bracket	purnima1	4	1,191	Mar-10-2023, 07:38 PM Last Post: deanhystad
	Inserting Numerical Value to the Element in Optionlist and Printing it into Entry	drbilgehanbakirhan	1	802	Jan-30-2023, 05:16 AM Last Post: deanhystad
	How to properly format rows and columns in excel data from parsed .txt blocks	jh67	7	1,874	Dec-12-2022, 08:22 PM Last Post: jh67

Filtering Excel Document Data Based On Numerical Values

User Panel Messages

Announcements