Python Forum
Regex Expression With Code Query In Pandas
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Regex Expression With Code Query In Pandas
#1
Hi there,

I have the following Part of a Python Code in Pandas :-

for i,row in temp_df.iterrows():
    text = row[1]
    text=text.replace('th','')
    text=text.replace('st','')
    text=text.replace('nd','')
    text=text.replace('rd','')
    temp_df.iat[i,1] = text
Could someone tell me, how I could include a Regex Expression, considering th- followed by a number i.e. th-2 and th- followed by a space, in my code it simply replacing th with nothing, messes things up. I need what I mentioned to be considered in replacing th with nothing. These are Dates I am referring to, i.e. 25th-26th for th-2 17th- for th- followed by space.

Any help would be much appreciated.

Best Regards

Eddie Winch )) Smile
Reply
#2
You should post a working sample of DataFrame,and then tell what you want to replace.
Can not make any sense with info has now,
and in most cases when use for loop in Pandas you are doing something not ideal.
Reply
#3
Hi snippsat,

Many thanks for your reply to my Thread.

Here is the full Python Code :-

import pandas as pd
import requests
from bs4 import BeautifulSoup
import xlrd

pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

res1 = requests.get("https://www.air-shows.org.uk/2022/03/airshow-news-battle-of-britain-memorial-flight-display-dates-2022/")     
soup1 = BeautifulSoup(res1.content,'lxml')

table1 = soup1.find_all('div',{'class' : 'post-content entry-content'})
lis1,lis2=[],[]
for table in table1:
    lis1 = table.find_all('ul')
    lis2 = table.find_all('h4')
    
new_lis1=[]
for l in lis1:
    temp=l.find_all('li')
    temp_x=[]
    for t in temp:
        temp_x.append(t.text)
    new_lis1.append(temp_x)
    
    new_lis2=[]
for l in lis2:
    new_lis2.append(l.text)
    
all_new_list = []
for lis in new_lis1:
    temp_x=[]
    for item in lis:
        item=item.replace('\xa0'," ")
        item=item.replace('\xa0'," ")
        item=item.replace('Flypast – ',"")
        item=item.replace('Flypasts – ',"")
        item=item.replace('Lancaster – ',"")
        item=item.replace('17th- ',"17th – ")
        item=item.replace('26th- ',"26th – ")
        temp_x.append(item.split(" – "))
    all_new_list.append(temp_x)
    
temp_df = pd.DataFrame(columns=['LOCATION','DATE','MONTH','LANCASTER','SPITFIRE','HURRICANE','DAKOTA'])

j=0
date,loc,month=[],[],[]
for i in range(1, len(new_lis2)):
    for item in all_new_list[i]:
        loc.append(item[1])
        date.append(item[0])
        month.append(new_lis2[i])
        
temp_df['LOCATION']=loc
temp_df['DATE'] = date
temp_df['MONTH'] = month


for i,row in temp_df.iterrows():
    text = row[0]
    if 'Lancaster Only' or 'Duxford' in text:
        temp_df.iat[i,3] = 'L'
        temp_df.iat[i,4] = '--'
        temp_df.iat[i,5] = '--'
        temp_df.iat[i,6] = '--'
    
for i,row in temp_df.iterrows():
    text = row[0]   
    if 'Spitfire Only' in text:    
        temp_df.iat[i,3] = '--'
        temp_df.iat[i,4] = 'S'
        temp_df.iat[i,5] = '--'
        temp_df.iat[i,6] = '--'
    
for i,row in temp_df.iterrows():
    text = row[0]    
    if 'Lancaster Only' not in text and 'Spitfire Only' not in text and 'Duxford' not in text:    
        temp_df.iat[i,3] = 'L'
        temp_df.iat[i,4] = 'S'
        temp_df.iat[i,5] = 'H'
        temp_df.iat[i,6] = '--'
    

for i,row in temp_df.iterrows():
    text = row[1]
    text=text.replace('th','')
    text=text.replace('st','')
    text=text.replace('nd','')
    text=text.replace('rd','')
    temp_df.iat[i,1] = text
    

for i,row in temp_df.iterrows():
    date = row[1]
    month = row[2]
    if '-' in date:
        if month == 'May' and 'June' not in date:
            date_list = date.split("-")
            for j in range(int(date_list[0]),int(date_list[1])+1):
                dic = {"LOCATION":row[0],"DATE":str(j),"MONTH":row[2],"LANCASTER":row[3],"SPITFIRE":row[4],'HURRICANE':row[5],'DAKOTA':row[6]}
                temp_df=temp_df.append(dic, ignore_index = True)
            temp_df=temp_df.drop(temp_df[ temp_df['DATE'] == row[1] ].index)
        if month == 'May' and 'June' in date:
            date=date.replace(" June","")
            date_list = date.split("-")
            for j in range(int(date_list[0]),32):
                dic = {"LOCATION":row[0],"DATE":str(j),"MONTH":row[2],"LANCASTER":row[3],"SPITFIRE":row[4],'HURRICANE':row[5],'DAKOTA':row[6]}
                temp_df=temp_df.append(dic, ignore_index = True)
            for j in range(1,int(date_list[1])+1):
                dic = {"LOCATION":row[0],"DATE":str(j),"MONTH":'June',"LANCASTER":row[3],"SPITFIRE":row[4],'HURRICANE':row[5],'DAKOTA':row[6]}
                temp_df=temp_df.append(dic, ignore_index = True)
            temp_df=temp_df.drop(temp_df[ temp_df['DATE'] == row[1] ].index)
        
        if month == 'June' and 'July' not in date and 'SAT ONLY' not in row[0]:
            date_list = date.split("-")
            for j in range(int(date_list[0]),int(date_list[1])+1):
                dic = {"LOCATION":row[0],"DATE":str(j),"MONTH":row[2],"LANCASTER":row[3],"SPITFIRE":row[4],'HURRICANE':row[5],'DAKOTA':row[6]}
                temp_df=temp_df.append(dic, ignore_index = True)
            temp_df=temp_df.drop(temp_df[ temp_df['DATE'] == row[1] ].index)
        if month == 'June' and 'July' not in date and 'SAT ONLY' in row[0]:
            date_list = date.split("-")
            dic = {"LOCATION":row[0],"DATE":str(date_list[0]),"MONTH":row[2],"LANCASTER":'L',"SPITFIRE":'--','HURRICANE':'--','DAKOTA':'--'}
            temp_df=temp_df.append(dic, ignore_index = True)
            dic = {"LOCATION":row[0],"DATE":str(date_list[1]),"MONTH":row[2],"LANCASTER":'--',"SPITFIRE":'S','HURRICANE':'H','DAKOTA':'--'}
            temp_df=temp_df.append(dic, ignore_index = True)
            temp_df=temp_df.drop(temp_df[ temp_df['DATE'] == row[1] ].index)
        if month == 'June' and 'July' in date:
            date=date.replace(" July","")
            date_list = date.split("-")
            for j in range(int(date_list[0]),31):
                dic = {"LOCATION":row[0],"DATE":str(j),"MONTH":row[2],"LANCASTER":row[3],"SPITFIRE":row[4],'HURRICANE':row[5],'DAKOTA':row[6]}
                temp_df=temp_df.append(dic, ignore_index = True)
            for j in range(1,int(date_list[1])+1):
                dic = {"LOCATION":row[0],"DATE":str(j),"MONTH":'July',"LANCASTER":row[3],"SPITFIRE":row[4],'HURRICANE':row[5],'DAKOTA':row[6]}
                temp_df=temp_df.append(dic, ignore_index = True)
            temp_df=temp_df.drop(temp_df[ temp_df['DATE'] == row[1] ].index)

        if month == 'July' and 'August' not in date:
            date_list = date.split("-")
            for j in range(int(date_list[0]),int(date_list[1])+1):
                dic = {"LOCATION":row[0],"DATE":str(j),"MONTH":row[2],"LANCASTER":row[3],"SPITFIRE":row[4],'HURRICANE':row[5],'DAKOTA':row[6]}
                temp_df=temp_df.append(dic, ignore_index = True)
            temp_df=temp_df.drop(temp_df[ temp_df['DATE'] == row[1] ].index)
        if month == 'July' and 'August' in date:
            date=date.replace(" August","")
            date_list = date.split("-")
            for j in range(int(date_list[0]),32):
                dic = {"LOCATION":row[0],"DATE":str(j),"MONTH":row[2],"LANCASTER":row[3],"SPITFIRE":row[4],'HURRICANE':row[5],'DAKOTA':row[6]}
                temp_df=temp_df.append(dic, ignore_index = True)
            for j in range(1,int(date_list[1])+1):
                dic = {"LOCATION":row[0],"DATE":str(j),"MONTH":'August',"LANCASTER":row[3],"SPITFIRE":row[4],'HURRICANE':row[5],'DAKOTA':row[6]}
                temp_df=temp_df.append(dic, ignore_index = True)
            temp_df=temp_df.drop(temp_df[ temp_df['DATE'] == row[1] ].index)
            
            
        if month == 'August' and 'September' not in date:
            date_list = date.split("-")
            for j in range(int(date_list[0]),int(date_list[1])+1):
                dic = {"LOCATION":row[0],"DATE":str(j),"MONTH":row[2],"LANCASTER":row[3],"SPITFIRE":row[4],'HURRICANE':row[5],'DAKOTA':row[6]}
                temp_df=temp_df.append(dic, ignore_index = True)
            temp_df=temp_df.drop(temp_df[ temp_df['DATE'] == row[1] ].index)
        if month == 'August' and 'September' in date:
            date=date.replace(" September","")
            date_list = date.split("-")
            for j in range(int(date_list[0]),32):
                dic = {"LOCATION":row[0],"DATE":str(j),"MONTH":row[2],"LANCASTER":row[3],"SPITFIRE":row[4],'HURRICANE':row[5],'DAKOTA':row[6]}
                temp_df=temp_df.append(dic, ignore_index = True)
            for j in range(1,int(date_list[1])+1):
                dic = {"LOCATION":row[0],"DATE":str(j),"MONTH":'September',"LANCASTER":row[3],"SPITFIRE":row[4],'HURRICANE':row[5],'DAKOTA':row[6]}
                temp_df=temp_df.append(dic, ignore_index = True)
            temp_df=temp_df.drop(temp_df[ temp_df['DATE'] == row[1] ].index)

        if month == 'September' and 'October' not in date:
            date_list = date.split("-")
            for j in range(int(date_list[0]),int(date_list[1])+1):
                dic = {"LOCATION":row[0],"DATE":str(j),"MONTH":row[2],"LANCASTER":row[3],"SPITFIRE":row[4],'HURRICANE':row[5],'DAKOTA':row[6]}
                temp_df=temp_df.append(dic, ignore_index = True)
            temp_df=temp_df.drop(temp_df[ temp_df['DATE'] == row[1] ].index)
        if month == 'September' and 'October' in date:
            date=date.replace(" October","")
            date_list = date.split("-")
            for j in range(int(date_list[0]),31):
                dic = {"LOCATION":row[0],"DATE":str(j),"MONTH":row[2],"LANCASTER":row[3],"SPITFIRE":row[4],'HURRICANE':row[5],'DAKOTA':row[6]}
                temp_df=temp_df.append(dic, ignore_index = True)
            for j in range(1,int(date_list[1])+1):
                dic = {"LOCATION":row[0],"DATE":str(j),"MONTH":'October',"LANCASTER":row[3],"SPITFIRE":row[4],'HURRICANE':row[5],'DAKOTA':row[6]}
                temp_df=temp_df.append(dic, ignore_index = True)
            temp_df=temp_df.drop(temp_df[ temp_df['DATE'] == row[1] ].index)
            

temp_df.reset_index(drop=True, inplace=True)

for i,row in temp_df.iterrows():
    date = row[1]
    date=date.replace(' (30 Reserve)',"")
    temp_df.iat[i,1]=date
    

for i,row in temp_df.iterrows():
    month = row[2]
    day = row[1]
    if month == 'June':
        date = '2022-'+str(5)+"-"+str(day)
    elif month == 'July':
        date = '2022-'+str(6)+"-"+str(day)
        
    elif month == 'August':
        date = '2022-'+str(7)+"-"+str(day)
    elif month == 'September':
        date = '2022-'+str(8)+"-"+str(day)
    temp_df.iat[i,1] = date
temp_df = temp_df.drop('MONTH',axis=1)

#for i,row in temp_df.iterrows():
    #loc = row[0]
    #if 'All available BBMF Aircraft' in loc:
        #temp_df.iat[i,2] = 'L'
        #temp_df.iat[i,3] = 'S'
        #temp_df.iat[i,4] = 'H'
        #temp_df.iat[i,5] = 'D'
        
for i,row in temp_df.iterrows():
    loc = row[0]
    day = row[1]
    if '24' in day and 'Headcorn' in loc or '25' in day and 'Headcorn' in loc:
        temp_df.iat[i,2] = '--'
        temp_df.iat[i,3] = 'S'
        temp_df.iat[i,4] = '--'

        
for i,row in temp_df.iterrows():
    loc = row[0]
    day = row[1]
    if '26' in day and 'Headcorn' in loc:
        temp_df.iat[i,2] = 'L'
        temp_df.iat[i,3] = 'S'
        temp_df.iat[i,4] = '--'

for i,row in temp_df.iterrows():
    loc = row[0]
    loc = loc.replace("Lancaster","")
    loc = loc.replace("Spitfire","")
    loc = loc.replace("Hurricane","")
    loc = loc.replace("Dakota","")
    loc = loc.replace("S x 2","")
    loc = loc.replace("S x 4","")
    loc = loc.replace("H x 2","")
    loc = loc.replace(" (, , , '","")
    loc = loc.replace(" (, , , )","")
    loc = loc.replace("  ()","")
    loc = loc.replace(" (, , )","")
    loc = loc.replace(" ()","")
    loc = loc.replace("(, , , '","")
    loc = loc.replace(" (, , , '","")
    loc = loc.replace(" (, )","")
    loc = loc.replace(" (SAT ONLY, , )","")
    
    loc = loc.replace(" (, , , ","")
    loc = loc.replace(" ()","")
    loc = loc.replace("All available BBMF Aircraft","")
    temp_df.iat[i,0]=loc
    
new_df = temp_df  
#display = temp_df [a-zA-Z]
#temp_df.head()

#display = new_df[(new_df['LOCATION'].str.contains('[a-zA-Z]')) & (new_df['LANCASTER'].str.contains('L')) & (new_df['SPITFIRE'].str.contains('S', na=True))]

display = new_df[(new_df['LOCATION'].str.contains('[a-zA-Z]'))]    

#display = new_df[(new_df['DATE'].str.contains('Sep'))] 

#display = df3[(df3['LOCATION'].str.contains('[a-zA-Z]')) & (df3['DATE'].str.contains('-6-|-8-')) & (df3['LANCASTER'].str.contains('L')) & (df3['SPITFIRE'].str.contains('S', na=True)) & (df3['DAKOTA'].str.contains('D', na=True))]    

#Use the above Line of Code when filtering DataFrame by Month
                                   
#Months = May Jun Jul Aug Sep                #('-6-')) For Multiple Months use ('-6-|-8-'))
#Months = -5- -6- -7- -8- -9-
                                  
#display = df3[(df3['LOCATION'].str.contains('[a-zA-Z]')) & (df3['DATE'].str.contains('-6$|-8$')) & (df3['LANCASTER'].str.contains('L')) & (df3['SPITFIRE'].str.contains('S', na=True)) & (df3['DAKOTA'].str.contains('D', na=True))]    

#df3['DATE'].str.contains('-6$')) or ('-6$|-8$')) For more than one Day. Use minus sign in front of the number when filtering the DataFrame by Days of Month.

                          #('-6$'))                 #('-6$|-8$'))

#print(display)
   
#display['DATE']= pd.to_datetime(display['DATE'],format='%Y-%m-%d')
#display['DATE']= pd.to_datetime(display['DATE']).dt.strftime('%m-%d-%Y')
#display=display.sort_values(by=['DATE'])
##added two lines above to convert date format
#display['DATE']= pd.to_datetime(display['DATE']).dt.strftime('%d-%b-%Y')    

display['DATE']= pd.to_datetime(display['DATE'],format='%Y-%m-%d')
#display['DATE']= pd.to_datetime(display['DATE']).dt.strftime('%m-%d-%Y')

#display.drop([303], inplace=True) 
#display.rename(index={16: '314', 17: '315'}, inplace=True)
display=display.rename_axis('MyIdx')
display=display.sort_values(by=['DATE','MyIdx'])
#display['DATE']= pd.to_datetime(display['DATE']).dt.strftime('%d-%b-%Y')
#display.drop_duplicates(subset=['LOCATION', 'DATE'], keep='last', inplace=True)

#display.drop('DAKOTA', axis=1, inplace=True)
#display.drop('Date', axis=1, inplace=True)
#display=display.dropna(subset=['SPITFIRE', 'HURRICANE'], how='all')
display=display[['LOCATION','DATE','LANCASTER','HURRICANE','SPITFIRE','DAKOTA']]
#display=display.fillna('--')
#display = display.sort_values(by=['DATE'])
display = display.sort_values(by=['DATE','LOCATION'])
display.reset_index(drop=True, inplace=True)
display['DATE']= pd.to_datetime(display['DATE']).dt.strftime('%d-%b-%Y')
display.to_csv("BBMF DISPLAY 2022.csv")
display
The lines of Code :-

item=item.replace('17th- ',"17th – ")
item=item.replace('26th- ',"26th – ")
sorted the initial problem I mentioned.

But now have a new problem, where the September Rows are not showing in the Output DataFrame, I have tried doing inspect Element, for September, on the Web page of which the Url is at the beginning of the Code. To see if anything is different for September, and i notice l, for Bournemouth Air Festiva whereas for others in the 'strong' part it starts with a single , or nothing. I tried replacing l, with a , or nothing and the September Rows still don't show.

With the following part of the Code :-

j=0
date,loc,month=[],[],[]
for i in range(1, len(new_lis2)):
    for item in all_new_list[i]:
        loc.append(item[1])
        date.append(item[0])
        month.append(new_lis2[i])
if I # out those two item.replace lines I posted before, and run the code :-

I get an
Error:
IndexError Traceback (most recent call last) <ipython-input-40-f67479ed6fd4> in <module> 49 for i in range(1, len(new_lis2)): 50 for item in all_new_list[i]: ---> 51 loc.append(item[1]) 52 date.append(item[0]) 53 month.append(new_lis2[i]) IndexError: list index out of range
And if I use them, I get a correct DataFrame Output, with rows from June to August, but not September.

And if I change to :-
j=0
date,loc,month=[],[],[]
for i in range(len(new_lis2)):
    for item in all_new_list[i]:
        loc.append(item[1])
        date.append(item[0])
        month.append(new_lis2[i])
Which was what I used initially, and run the code, I get the same error, when I use those two lines of item.replace Code.

What makes me think the September Rows are not showing because of an anomoly in the text, is that hashing out those two lines of Code
, produces an list index out of range error. I run the full Code in Jupyter Notebook.

Any help anyone could give me, would be very much appreciated.

Best Regards

Eddie Winch Smile
Reply
#4
Still very-very fuzzy for me - what is the objective. What is desired result? Lot of things in this code should/could be improved but I have a feeling that lot of things could be just skipped as they are not necessary. To determine whether my feeling is correct or not - could you spell your objective out in natural language.
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply
#5
I get dizzy looking at that code😵
The web-site simple then you scrape it and want to add some extra information,
but the code is a mess and not something i have time to dive into now.

Just to show something quick using your data(all_new_list) and getting into a DataFrame some clean up and adding some stuff.
The NoteBook look for the for loop as mention do you see any👀
Reply
#6
Hi snippsat,

Many thanks for your excellent Notebook example, certainly a good and more straightforward way, of sorting the Scraped Website Data.
Do you have any ideas, why the September rows are not showing in my DataFrame from my Code ? As as you can see, the Data is present
in all_new_list i.e. Bournemouth Air Festival, Causeway Airshow etc. And September is in new_lis2 ? Yet the September Rows are not showing.

Regards

Eddie Winch
Reply
#7
Hi snippsat,

I finally sorted out the issue I was having, I am busy at the moment, but I will tell you how I sorted it out later.

Regards

Eddie Winch Smile
Reply
#8
Hi snippsat,

Here is the amended Code, that now includes the September Rows, in the DataFrame Output :-

import pandas as pd
import requests
from bs4 import BeautifulSoup
import xlrd

pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

res1 = requests.get("https://www.air-shows.org.uk/2022/03/airshow-news-battle-of-britain-memorial-flight-display-dates-2022/")     
soup1 = BeautifulSoup(res1.content,'lxml')

table1 = soup1.find_all('div',{'class' : 'post-content entry-content'})
lis1,lis2=[],[]
for table in table1:
    lis1 = table.find_all('ul')
    lis2 = table.find_all('h4')
    
new_lis1=[]
for l in lis1:
    temp=l.find_all('li')
    temp_x=[]
    for t in temp:
        temp_x.append(t.text)
    new_lis1.append(temp_x)
    
    new_lis2=[]
for l in lis2:
    new_lis2.append(l.text)
    
new_lis2.insert(0, 'May')
    
all_new_list = []
for lis in new_lis1:
    temp_x=[]
    for item in lis:
        item=item.replace('\xa0'," ")
        item=item.replace('\xa0'," ")
        item=item.replace('Flypast – ',"")
        item=item.replace('Flypasts – ',"")
        item=item.replace('Lancaster – ',"")
        item=item.replace('17th- ',"17th – ")
        item=item.replace('26th- ',"26th – ")
        temp_x.append(item.split(" – "))
    all_new_list.append(temp_x)
    
temp_df = pd.DataFrame(columns=['LOCATION','DATE','MONTH','LANCASTER','SPITFIRE','HURRICANE','DAKOTA'])

j=0
date,loc,month=[],[],[]
for i in range(1,len(new_lis2)):
    for item in all_new_list[i]:
        loc.append(item[1])
        date.append(item[0])
        month.append(new_lis2[i])
        
temp_df['LOCATION']=loc
temp_df['DATE'] = date
temp_df['MONTH'] = month


for i,row in temp_df.iterrows():
    text = row[0]
    if 'Lancaster Only' or 'Duxford' in text:
        temp_df.iat[i,3] = 'L'
        temp_df.iat[i,4] = '--'
        temp_df.iat[i,5] = '--'
        temp_df.iat[i,6] = '--'
    
for i,row in temp_df.iterrows():
    text = row[0]   
    if 'Spitfire Only' in text:    
        temp_df.iat[i,3] = '--'
        temp_df.iat[i,4] = 'S'
        temp_df.iat[i,5] = '--'
        temp_df.iat[i,6] = '--'
    
for i,row in temp_df.iterrows():
    text = row[0]    
    if 'Lancaster Only' not in text and 'Spitfire Only' not in text and 'Duxford' not in text:    
        temp_df.iat[i,3] = 'L'
        temp_df.iat[i,4] = 'S'
        temp_df.iat[i,5] = 'H'
        temp_df.iat[i,6] = '--'
    

for i,row in temp_df.iterrows():
    text = row[1]
    text=text.replace('th','')
    text=text.replace('st','')
    text=text.replace('nd','')
    text=text.replace('rd','')
    temp_df.iat[i,1] = text
    

for i,row in temp_df.iterrows():
    date = row[1]
    month = row[2]
    if '-' in date:
        if month == 'May' and 'June' not in date:
            date_list = date.split("-")
            for j in range(int(date_list[0]),int(date_list[1])+1):
                dic = {"LOCATION":row[0],"DATE":str(j),"MONTH":row[2],"LANCASTER":row[3],"SPITFIRE":row[4],'HURRICANE':row[5],'DAKOTA':row[6]}
                temp_df=temp_df.append(dic, ignore_index = True)
            temp_df=temp_df.drop(temp_df[ temp_df['DATE'] == row[1] ].index)
        if month == 'May' and 'June' in date:
            date=date.replace(" June","")
            date_list = date.split("-")
            for j in range(int(date_list[0]),32):
                dic = {"LOCATION":row[0],"DATE":str(j),"MONTH":row[2],"LANCASTER":row[3],"SPITFIRE":row[4],'HURRICANE':row[5],'DAKOTA':row[6]}
                temp_df=temp_df.append(dic, ignore_index = True)
            for j in range(1,int(date_list[1])+1):
                dic = {"LOCATION":row[0],"DATE":str(j),"MONTH":'June',"LANCASTER":row[3],"SPITFIRE":row[4],'HURRICANE':row[5],'DAKOTA':row[6]}
                temp_df=temp_df.append(dic, ignore_index = True)
            temp_df=temp_df.drop(temp_df[ temp_df['DATE'] == row[1] ].index)
        
        if month == 'June' and 'July' not in date and 'SAT ONLY' not in row[0]:
            date_list = date.split("-")
            for j in range(int(date_list[0]),int(date_list[1])+1):
                dic = {"LOCATION":row[0],"DATE":str(j),"MONTH":row[2],"LANCASTER":row[3],"SPITFIRE":row[4],'HURRICANE':row[5],'DAKOTA':row[6]}
                temp_df=temp_df.append(dic, ignore_index = True)
            temp_df=temp_df.drop(temp_df[ temp_df['DATE'] == row[1] ].index)
        if month == 'June' and 'July' not in date and 'SAT ONLY' in row[0]:
            date_list = date.split("-")
            dic = {"LOCATION":row[0],"DATE":str(date_list[0]),"MONTH":row[2],"LANCASTER":'L',"SPITFIRE":'--','HURRICANE':'--','DAKOTA':'--'}
            temp_df=temp_df.append(dic, ignore_index = True)
            dic = {"LOCATION":row[0],"DATE":str(date_list[1]),"MONTH":row[2],"LANCASTER":'--',"SPITFIRE":'S','HURRICANE':'H','DAKOTA':'--'}
            temp_df=temp_df.append(dic, ignore_index = True)
            temp_df=temp_df.drop(temp_df[ temp_df['DATE'] == row[1] ].index)
        if month == 'June' and 'July' in date:
            date=date.replace(" July","")
            date_list = date.split("-")
            for j in range(int(date_list[0]),31):
                dic = {"LOCATION":row[0],"DATE":str(j),"MONTH":row[2],"LANCASTER":row[3],"SPITFIRE":row[4],'HURRICANE':row[5],'DAKOTA':row[6]}
                temp_df=temp_df.append(dic, ignore_index = True)
            for j in range(1,int(date_list[1])+1):
                dic = {"LOCATION":row[0],"DATE":str(j),"MONTH":'July',"LANCASTER":row[3],"SPITFIRE":row[4],'HURRICANE':row[5],'DAKOTA':row[6]}
                temp_df=temp_df.append(dic, ignore_index = True)
            temp_df=temp_df.drop(temp_df[ temp_df['DATE'] == row[1] ].index)

        if month == 'July' and 'August' not in date:
            date_list = date.split("-")
            for j in range(int(date_list[0]),int(date_list[1])+1):
                dic = {"LOCATION":row[0],"DATE":str(j),"MONTH":row[2],"LANCASTER":row[3],"SPITFIRE":row[4],'HURRICANE':row[5],'DAKOTA':row[6]}
                temp_df=temp_df.append(dic, ignore_index = True)
            temp_df=temp_df.drop(temp_df[ temp_df['DATE'] == row[1] ].index)
        if month == 'July' and 'August' in date:
            date=date.replace(" August","")
            date_list = date.split("-")
            for j in range(int(date_list[0]),32):
                dic = {"LOCATION":row[0],"DATE":str(j),"MONTH":row[2],"LANCASTER":row[3],"SPITFIRE":row[4],'HURRICANE':row[5],'DAKOTA':row[6]}
                temp_df=temp_df.append(dic, ignore_index = True)
            for j in range(1,int(date_list[1])+1):
                dic = {"LOCATION":row[0],"DATE":str(j),"MONTH":'August',"LANCASTER":row[3],"SPITFIRE":row[4],'HURRICANE':row[5],'DAKOTA':row[6]}
                temp_df=temp_df.append(dic, ignore_index = True)
            temp_df=temp_df.drop(temp_df[ temp_df['DATE'] == row[1] ].index)
            
            
        if month == 'August' and 'September' not in date:
            date_list = date.split("-")
            for j in range(int(date_list[0]),int(date_list[1])+1):
                dic = {"LOCATION":row[0],"DATE":str(j),"MONTH":row[2],"LANCASTER":row[3],"SPITFIRE":row[4],'HURRICANE':row[5],'DAKOTA':row[6]}
                temp_df=temp_df.append(dic, ignore_index = True)
            temp_df=temp_df.drop(temp_df[ temp_df['DATE'] == row[1] ].index)
        if month == 'August' and 'September' in date:
            date=date.replace(" September","")
            date_list = date.split("-")
            for j in range(int(date_list[0]),32):
                dic = {"LOCATION":row[0],"DATE":str(j),"MONTH":row[2],"LANCASTER":row[3],"SPITFIRE":row[4],'HURRICANE':row[5],'DAKOTA':row[6]}
                temp_df=temp_df.append(dic, ignore_index = True)
            for j in range(1,int(date_list[1])+1):
                dic = {"LOCATION":row[0],"DATE":str(j),"MONTH":'September',"LANCASTER":row[3],"SPITFIRE":row[4],'HURRICANE':row[5],'DAKOTA':row[6]}
                temp_df=temp_df.append(dic, ignore_index = True)
            temp_df=temp_df.drop(temp_df[ temp_df['DATE'] == row[1] ].index)

        if month == 'September' and 'October' not in date:
            date_list = date.split("-")
            for j in range(int(date_list[0]),int(date_list[1])+1):
                dic = {"LOCATION":row[0],"DATE":str(j),"MONTH":row[2],"LANCASTER":row[3],"SPITFIRE":row[4],'HURRICANE':row[5],'DAKOTA':row[6]}
                temp_df=temp_df.append(dic, ignore_index = True)
            temp_df=temp_df.drop(temp_df[ temp_df['DATE'] == row[1] ].index)
        if month == 'September' and 'October' in date:
            date=date.replace(" October","")
            date_list = date.split("-")
            for j in range(int(date_list[0]),31):
                dic = {"LOCATION":row[0],"DATE":str(j),"MONTH":row[2],"LANCASTER":row[3],"SPITFIRE":row[4],'HURRICANE':row[5],'DAKOTA':row[6]}
                temp_df=temp_df.append(dic, ignore_index = True)
            for j in range(1,int(date_list[1])+1):
                dic = {"LOCATION":row[0],"DATE":str(j),"MONTH":'October',"LANCASTER":row[3],"SPITFIRE":row[4],'HURRICANE':row[5],'DAKOTA':row[6]}
                temp_df=temp_df.append(dic, ignore_index = True)
            temp_df=temp_df.drop(temp_df[ temp_df['DATE'] == row[1] ].index)
            

temp_df.reset_index(drop=True, inplace=True)

for i,row in temp_df.iterrows():
    date = row[1]
    date=date.replace(' (30 Reserve)',"")
    temp_df.iat[i,1]=date
    

for i,row in temp_df.iterrows():
    month = row[2]
    day = row[1]
    if month == 'May':
        date = '2022-'+str(5)+"-"+str(day)
    elif month == 'June':
        date = '2022-'+str(6)+"-"+str(day)
    elif month == 'July':
        date = '2022-'+str(7)+"-"+str(day)
        
    elif month == 'August':
        date = '2022-'+str(8)+"-"+str(day)
    elif month == 'September':
        date = '2022-'+str(9)+"-"+str(day)
    temp_df.iat[i,1] = date
temp_df = temp_df.drop('MONTH',axis=1)

#for i,row in temp_df.iterrows():
    #loc = row[0]
    #if 'All available BBMF Aircraft' in loc:
        #temp_df.iat[i,2] = 'L'
        #temp_df.iat[i,3] = 'S'
        #temp_df.iat[i,4] = 'H'
        #temp_df.iat[i,5] = 'D'
        
for i,row in temp_df.iterrows():
    loc = row[0]
    day = row[1]
    if '24' in day and 'Headcorn' in loc or '25' in day and 'Headcorn' in loc:
        temp_df.iat[i,2] = '--'
        temp_df.iat[i,3] = 'S'
        temp_df.iat[i,4] = '--'

        
for i,row in temp_df.iterrows():
    loc = row[0]
    day = row[1]
    if '26' in day and 'Headcorn' in loc:
        temp_df.iat[i,2] = 'L'
        temp_df.iat[i,3] = 'S'
        temp_df.iat[i,4] = '--'

for i,row in temp_df.iterrows():
    loc = row[0]
    loc = loc.replace("Lancaster","")
    loc = loc.replace("Spitfire","")
    loc = loc.replace("Hurricane","")
    loc = loc.replace("Dakota","")
    loc = loc.replace("S x 2","")
    loc = loc.replace("S x 4","")
    loc = loc.replace("H x 2","")
    loc = loc.replace(" (, , , '","")
    loc = loc.replace(" (, , , )","")
    loc = loc.replace("  ()","")
    loc = loc.replace(" (, , )","")
    loc = loc.replace(" ()","")
    loc = loc.replace("(, , , '","")
    loc = loc.replace(" (, , , '","")
    loc = loc.replace(" (, )","")
    loc = loc.replace(" (SAT ONLY, , )","")
    
    loc = loc.replace(" (, , , ","")
    loc = loc.replace(" ()","")
    loc = loc.replace("All available BBMF Aircraft","")
    temp_df.iat[i,0]=loc
    
new_df = temp_df  
#display = temp_df [a-zA-Z]
#temp_df.head()

#display = new_df[(new_df['LOCATION'].str.contains('[a-zA-Z]')) & (new_df['LANCASTER'].str.contains('L')) & (new_df['SPITFIRE'].str.contains('S', na=True))]

display = new_df[(new_df['LOCATION'].str.contains('[a-zA-Z]'))]    

#display = new_df[(new_df['DATE'].str.contains('Sep'))] 

#display = df3[(df3['LOCATION'].str.contains('[a-zA-Z]')) & (df3['DATE'].str.contains('-6-|-8-')) & (df3['LANCASTER'].str.contains('L')) & (df3['SPITFIRE'].str.contains('S', na=True)) & (df3['DAKOTA'].str.contains('D', na=True))]    

#Use the above Line of Code when filtering DataFrame by Month
                                   
#Months = May Jun Jul Aug Sep                #('-6-')) For Multiple Months use ('-6-|-8-'))
#Months = -5- -6- -7- -8- -9-
                                  
#display = df3[(df3['LOCATION'].str.contains('[a-zA-Z]')) & (df3['DATE'].str.contains('-6$|-8$')) & (df3['LANCASTER'].str.contains('L')) & (df3['SPITFIRE'].str.contains('S', na=True)) & (df3['DAKOTA'].str.contains('D', na=True))]    

#df3['DATE'].str.contains('-6$')) or ('-6$|-8$')) For more than one Day. Use minus sign in front of the number when filtering the DataFrame by Days of Month.

                          #('-6$'))                 #('-6$|-8$'))

#print(display)
   
#display['DATE']= pd.to_datetime(display['DATE'],format='%Y-%m-%d')
#display['DATE']= pd.to_datetime(display['DATE']).dt.strftime('%m-%d-%Y')
#display=display.sort_values(by=['DATE'])
##added two lines above to convert date format
#display['DATE']= pd.to_datetime(display['DATE']).dt.strftime('%d-%b-%Y')    

display['DATE']= pd.to_datetime(display['DATE'],format='%Y-%m-%d')
#display['DATE']= pd.to_datetime(display['DATE']).dt.strftime('%m-%d-%Y')

#display.drop([303], inplace=True) 
#display.rename(index={16: '314', 17: '315'}, inplace=True)
display=display.rename_axis('MyIdx')
display=display.sort_values(by=['DATE','MyIdx'])
#display['DATE']= pd.to_datetime(display['DATE']).dt.strftime('%d-%b-%Y')
#display.drop_duplicates(subset=['LOCATION', 'DATE'], keep='last', inplace=True)

#display.drop('DAKOTA', axis=1, inplace=True)
#display.drop('Date', axis=1, inplace=True)
#display=display.dropna(subset=['SPITFIRE', 'HURRICANE'], how='all')
display=display[['LOCATION','DATE','LANCASTER','HURRICANE','SPITFIRE','DAKOTA']]
#display=display.fillna('--')
#display = display.sort_values(by=['DATE'])
display = display.sort_values(by=['DATE','LOCATION'])
display.reset_index(drop=True, inplace=True)
display['DATE']= pd.to_datetime(display['DATE']).dt.strftime('%d-%b-%Y')
display.to_csv("BBMF DISPLAY 2022.csv")
display
The reason that the September Rows were not showing before in the DataFrame Output, was because the way the Code is written, for the 2018 BBMF Schedule from the same Website as for the 2022 Schedule. May was being looked for in the list new_lis2, and in the 2022 Schedule only June, July, August and September were present there.

So I Inserted 'May' into the first position, of the new_lis2 List, by adding this piece of Code :-

new_lis2.insert(0, 'May')
'0' is the first position.

And changing these lines of Code :-

for i,row in temp_df.iterrows():
    month = row[2]
    day = row[1]
    if month == 'June':
        date = '2022-'+str(5)+"-"+str(day)
    elif month == 'July':
        date = '2022-'+str(6)+"-"+str(day)
        
    elif month == 'August':
        date = '2022-'+str(7)+"-"+str(day)
    elif month == 'September':
        date = '2022-'+str(8)+"-"+str(day)
    temp_df.iat[i,1] = date
temp_df = temp_df.drop('MONTH',axis=1)
To :-

for i,row in temp_df.iterrows():
    month = row[2]
    day = row[1]
    if month == 'May':
        date = '2022-'+str(5)+"-"+str(day)
    elif month == 'June':
        date = '2022-'+str(6)+"-"+str(day)
    elif month == 'July':
        date = '2022-'+str(7)+"-"+str(day)
        
    elif month == 'August':
        date = '2022-'+str(8)+"-"+str(day)
    elif month == 'September':
        date = '2022-'+str(9)+"-"+str(day)
    temp_df.iat[i,1] = date
temp_df = temp_df.drop('MONTH',axis=1)
Using :-

j=0
date,loc,month=[],[],[]
for i in range(1,len(new_lis2)):
    for item in all_new_list[i]:
        loc.append(item[1])
        date.append(item[0])
        month.append(new_lis2[i])
When the 1 isn't in front of len, the Error I mentioned earlier on occurs, but I think putting 1 in front of len, works
because the dates start at June rather than May, i.e. position two i.e. 0,1 so 1.

Best Regards

Eddie Winch Smile
Reply
#9
Thx for update.
I think you did this task more difficult than it should be.
Clean up data before and when in Pandas work on whole DataFrame with vectorized and build in solutions.
So iterrows() should only be used when needed,in most cases not at all.
People think in Python way where loop is common to us,Pandas need a different approach.

Some more info in Optimizing Pandas
Quote:Perhaps the most important rule is to avoid using loops in Pandas code.
Looping over a Series or a DataFrame processes data one item or row/column at a time.
Instead, operations should be vectorized.
This means an operation should be performed on the entire Series or DataFrame row/column.
Developers should think of all operations as matrix computations that can be parallelized
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Python Alteryx QS-Passing pandas dataframe column inside SQL query where condition sanky1990 0 730 Dec-04-2023, 09:48 PM
Last Post: sanky1990
  [solved] Regex expression do not want to taken :/ SpongeB0B 2 744 Nov-06-2023, 02:43 PM
Last Post: SpongeB0B
  Regex Include and Exclude patterns in Same Expression starzar 2 785 May-23-2023, 09:12 AM
Last Post: Gribouillis
  Is the following code returning a generator expression? quazirfan 8 1,622 Apr-11-2023, 11:44 PM
Last Post: quazirfan
  Regular Expression search to comment lines of code Gman2233 5 1,663 Sep-08-2022, 06:57 AM
Last Post: ndc85430
  Query regarding regex Python_dumb 1 1,133 Jul-26-2022, 03:14 PM
Last Post: deanhystad
  List Creation and Position of Continue Statement In Regular Expression Code new_coder_231013 3 1,662 Jun-15-2022, 12:00 PM
Last Post: new_coder_231013
  Need help with my code (regular expression) shailc 5 1,920 Apr-04-2022, 07:34 PM
Last Post: shailc
  Problem in Regex Expression shantanu97 2 1,706 Sep-28-2021, 03:40 AM
Last Post: shantanu97
  Partial Matching Rows In Pandas DataFrame Query eddywinch82 1 2,366 Jul-08-2021, 06:32 PM
Last Post: eddywinch82

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020