Jan-31-2020, 07:02 PM
(This post was last modified: Jan-31-2020, 07:02 PM by eddywinch82.)
I have modified the Code, on this Thread, For a BBMF Year 2005 Display Schedule, which is broken down, to seperate Urls, for each Month. So I am trying to get, a DataFrame Output, for the Whole Year.
Here is the Modified Code :-
But I get the following Traceback Error, when I run the Code, in Jupyter Notebook :-
Any help would be appreciated
Best Regards
Eddie Winch
Here is the Modified Code :-
import pandas as pd import requests from bs4 import BeautifulSoup res = requests.get("http://web.archive.org/web/20050726230748/http://www.raf.mod.uk/bbmf/may05.html") res = requests.get("http://web.archive.org/web/20050726230748/http://www.raf.mod.uk/bbmf/june05.html") res = requests.get("http://web.archive.org/web/20050726230748/http://www.raf.mod.uk/bbmf/july05.html") res = requests.get("http://web.archive.org/web/20050726230748/http://www.raf.mod.uk/bbmf/august05.html") res = requests.get("http://web.archive.org/web/20050726230748/http://www.raf.mod.uk/bbmf/september05.html") soup = BeautifulSoup(res.content,'lxml') table = soup.find_all('table')[0] df = pd.read_html(str(table)) df = df[0] pd.set_option('display.max_rows', 500) pd.set_option('display.max_columns', 500) pd.set_option('display.width', 1000) #make df[0] to list list=[] for i in df[0]: list.append(i) #reverse the list to make split to sublist easier list.reverse() #split list to sublist using condition len(val)> 2 size = len(list) idx_list = [idx + 1 for idx, val in enumerate(list) if len(val) > 2] res = [list[i: j] for i, j in zip([0] + idx_list, idx_list + ([size] if idx_list[-1] != size else []))] #make monthname to numbers and print for i in res: for j in range(len(i)): if i[j].upper()=='JUNE': i[j]='6' elif i[j].upper() =='MAY': i[j]='5' elif i[j].upper() == 'APRIL': i[j]='4' elif i[j].upper() =='JANUARY': i[j]='1' elif i[j].upper() == 'FEBRUARY': i[j]='2' elif i[j].upper() =='MARCH': i[j]='3' elif i[j].upper() == 'JULY': i[j]='7' elif i[j].upper() =='AUGUST': i[j]='8' elif i[j].upper() == 'SEPTEMBER': i[j]='9' elif i[j].upper() =='OCTOBER': i[j]='10' elif i[j].upper() == 'NOVEMBER': i[j]='11' elif i[j].upper() =='DECEMBER': i[j]='12' #append string and append to new list finallist=[] for i in res: for j in range(len(i)): if j < len(i) - 1: #print(f'2005-{i[-1]}-{i[j]}') finallist.append(f'2005-{i[-1]}-{i[j]}') #print(finallist) finallist.reverse() #print("\n=== ORIGINAL DF ===\n") #print(df) #convert dataframe to list listtemp1=df.values.tolist() #replace found below values with 0000_removable removelist=['LOCATION','LANCASTER','SPITFIRE','HURRICANE','DAKOTA','DATE','JUNE','JANUARY','FEBRUARY','MARCH','MAY','JULY','AUGUST','SEPTEMBER','OCTOBER','NOVEMBER','DECEMBER','APRIL'] for i in listtemp1: for j in range(len(i)): for place in removelist: if str(i[j]).upper()==place: i[j]='0000_removable' else: pass #remove sublists with the replaced values we redirected dellist=['0000_removable', '0000_removable', '0000_removable', '0000_removable', '0000_removable', '0000_removable'] res = [i for i in listtemp1 if i != dellist] #assign back to dataframe DF3 df3=pd.DataFrame() df3=pd.DataFrame(res, columns=['Date','LOCATION','LANCASTER','SPITFIRE','HURRICANE','DAKOTA']) #print("\n=== AFTER REMOVE month and column names from DF, assigned to new as DF3 ===\n") #print(df3) #now assign that sorted date list to dataframe DF3 idx = 0 df3.insert(loc=idx, column='DATE', value=finallist) pd.options.display.max_rows = 500 df["DATE"].fillna(method='ffill', inplace = True) display = df3[(df3['Location'].str.contains('- Display')) & (df3['Dakota'].str.contains('D')) & (df3['Spitfire'].str.contains('S', na=True)) & (df3['Lancaster'] != 'L')] display display['DATE']= pd.to_datetime(display['DATE'],format='%Y-%m-%d') display['DATE']= pd.to_datetime(display['DATE']).dt.strftime('%d-%m-%Y') ##added two lines above to convert date format display.drop('Lancaster', axis=1, inplace=True) display.dropna(subset=['Spitfire', 'Hurricane'], how='all') #df[(df['Location'].str.contains('- Display')) #df[(df['Dakota'].str.contains('D')) #(df['Dakota'].str.contains('D')) #(df['Spitfire'] == 'SSS')I am trying to get a DataFrame Output, for the whole Year 2005, from all those Url Links, in the Code.
But I get the following Traceback Error, when I run the Code, in Jupyter Notebook :-
Error:TypeError Traceback (most recent call last)
<ipython-input-1-ae00b7540e28> in <module>
31 size = len(list)
32 idx_list = [idx + 1 for idx, val in
---> 33 enumerate(list) if len(val) > 2]
34 res = [list[i: j] for i, j in
35 zip([0] + idx_list, idx_list +
<ipython-input-1-ae00b7540e28> in <listcomp>(.0)
31 size = len(list)
32 idx_list = [idx + 1 for idx, val in
---> 33 enumerate(list) if len(val) > 2]
34 res = [list[i: j] for i, j in
35 zip([0] + idx_list, idx_list +
TypeError: object of type 'float' has no len()
I can't work out, what is causing the Error, Any ideas ?Any help would be appreciated
Best Regards
Eddie Winch