Obtaining Correct Date In Pandas DataFrame

eddywinch82 · (This post was last modified: Jan-31-2020, 07:02 PM by eddywinch82.)

I have modified the Code, on this Thread, For a BBMF Year 2005 Display Schedule, which is broken down, to seperate Urls, for each Month. So I am trying to get, a DataFrame Output, for the Whole Year.

Here is the Modified Code :-

import pandas as pd
import requests
from bs4 import BeautifulSoup

res = requests.get("http://web.archive.org/web/20050726230748/http://www.raf.mod.uk/bbmf/may05.html")
res = requests.get("http://web.archive.org/web/20050726230748/http://www.raf.mod.uk/bbmf/june05.html")
res = requests.get("http://web.archive.org/web/20050726230748/http://www.raf.mod.uk/bbmf/july05.html")
res = requests.get("http://web.archive.org/web/20050726230748/http://www.raf.mod.uk/bbmf/august05.html")
res = requests.get("http://web.archive.org/web/20050726230748/http://www.raf.mod.uk/bbmf/september05.html")

soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[0]
df = pd.read_html(str(table))

df = df[0]

pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)
 
 
#make df[0] to list
list=[]
for i in df[0]:
    list.append(i)
  
#reverse the list to make split to sublist easier
list.reverse()
  
#split list to sublist using condition len(val)> 2 
size = len(list) 
idx_list = [idx + 1 for idx, val in
            enumerate(list) if len(val) > 2] 
res = [list[i: j] for i, j in
        zip([0] + idx_list, idx_list + 
        ([size] if idx_list[-1] != size else []))] 
  
#make monthname to numbers and print
for i in res:
    for j in range(len(i)):
        if i[j].upper()=='JUNE':
            i[j]='6'
        elif i[j].upper() =='MAY':
            i[j]='5'
        elif i[j].upper() == 'APRIL':
            i[j]='4'
        elif i[j].upper() =='JANUARY':
            i[j]='1'
        elif i[j].upper() == 'FEBRUARY':
            i[j]='2'
        elif i[j].upper() =='MARCH':
            i[j]='3'
        elif i[j].upper() == 'JULY':
            i[j]='7'        
        elif i[j].upper() =='AUGUST':
            i[j]='8'
        elif i[j].upper() == 'SEPTEMBER':
            i[j]='9'
        elif i[j].upper() =='OCTOBER':
            i[j]='10'
        elif i[j].upper() == 'NOVEMBER':
            i[j]='11'
        elif i[j].upper() =='DECEMBER':
            i[j]='12'       
  
  
#append string and append to new list
finallist=[]
for i in res:
    for j in range(len(i)):
        if j < len(i) - 1:
            #print(f'2005-{i[-1]}-{i[j]}')
            finallist.append(f'2005-{i[-1]}-{i[j]}')
#print(finallist)
finallist.reverse()
  
#print("\n=== ORIGINAL DF ===\n")
#print(df)
  
#convert dataframe to list
listtemp1=df.values.tolist()
  
#replace found below values with 0000_removable
removelist=['LOCATION','LANCASTER','SPITFIRE','HURRICANE','DAKOTA','DATE','JUNE','JANUARY','FEBRUARY','MARCH','MAY','JULY','AUGUST','SEPTEMBER','OCTOBER','NOVEMBER','DECEMBER','APRIL']
for i in listtemp1:
    for j in range(len(i)):
        for place in removelist:
            if str(i[j]).upper()==place:
                i[j]='0000_removable'
            else:
                pass
  
                  
#remove sublists with the replaced values we redirected
dellist=['0000_removable', '0000_removable', '0000_removable', '0000_removable', '0000_removable', '0000_removable']
res = [i for i in listtemp1 if i != dellist]
  
#assign back to dataframe DF3
df3=pd.DataFrame()
df3=pd.DataFrame(res, columns=['Date','LOCATION','LANCASTER','SPITFIRE','HURRICANE','DAKOTA'])
#print("\n=== AFTER REMOVE month and column names from DF, assigned to new as DF3 ===\n")
#print(df3)
  
  
#now assign that sorted date list to dataframe DF3
idx = 0
df3.insert(loc=idx, column='DATE', value=finallist)
pd.options.display.max_rows = 500

df["DATE"].fillna(method='ffill', inplace = True)

display = df3[(df3['Location'].str.contains('- Display')) & (df3['Dakota'].str.contains('D')) & (df3['Spitfire'].str.contains('S', na=True)) & (df3['Lancaster'] != 'L')]  
display

display['DATE']= pd.to_datetime(display['DATE'],format='%Y-%m-%d')
display['DATE']= pd.to_datetime(display['DATE']).dt.strftime('%d-%m-%Y')
##added two lines above to convert date format

display.drop('Lancaster', axis=1, inplace=True)
display.dropna(subset=['Spitfire', 'Hurricane'], how='all')

#df[(df['Location'].str.contains('- Display'))

#df[(df['Dakota'].str.contains('D'))

#(df['Dakota'].str.contains('D'))

#(df['Spitfire'] == 'SSS')

I am trying to get a DataFrame Output, for the whole Year 2005, from all those Url Links, in the Code.

But I get the following Traceback Error, when I run the Code, in Jupyter Notebook :-

Error:TypeError                                 Traceback (most recent call last)
<ipython-input-1-ae00b7540e28> in <module>
     31 size = len(list)
     32 idx_list = [idx + 1 for idx, val in
---> 33             enumerate(list) if len(val) > 2] 
     34 res = [list[i: j] for i, j in
     35         zip([0] + idx_list, idx_list + 

<ipython-input-1-ae00b7540e28> in <listcomp>(.0)
     31 size = len(list)
     32 idx_list = [idx + 1 for idx, val in
---> 33             enumerate(list) if len(val) > 2] 
     34 res = [list[i: j] for i, j in
     35         zip([0] + idx_list, idx_list + 

TypeError: object of type 'float' has no len()

I can't work out, what is causing the Error, Any ideas ?

Any help would be appreciated

Best Regards

Eddie Winch

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Add NER output to pandas dataframe	dg3000	0	214	Apr-22-2024, 08:14 PM Last Post: dg3000
	HTML Decoder pandas dataframe column	mbrown009	3	1,108	Sep-29-2023, 05:56 PM Last Post: deanhystad
	Pandas read csv file in 'date/time' chunks	MorganSamage	4	1,761	Feb-13-2023, 11:24 AM Last Post: MorganSamage
	Use pandas to obtain cartesian product between a dataframe of int and equations?	haihal	0	1,149	Jan-06-2023, 10:53 PM Last Post: haihal
	Pandas Dataframe Filtering based on rows	mvdlm	0	1,475	Apr-02-2022, 06:39 PM Last Post: mvdlm
	Pandas dataframe: calculate metrics by year	mcva	1	2,363	Mar-02-2022, 08:22 AM Last Post: mcva
	Pandas dataframe comparing	anto5	0	1,295	Jan-30-2022, 10:21 AM Last Post: anto5
	PANDAS: DataFrame \| Replace and others questions	moduki1	2	1,839	Jan-10-2022, 07:19 PM Last Post: moduki1
	PANDAS: DataFrame \| Saving the wrong value	moduki1	0	1,583	Jan-10-2022, 04:42 PM Last Post: moduki1
	update values in one dataframe based on another dataframe - Pandas	iliasb	2	9,404	Aug-14-2021, 12:38 PM Last Post: jefsummers

Obtaining Correct Date In Pandas DataFrame

User Panel Messages

Announcements