Aug-22-2020, 05:45 PM
Hi there,
I have the following Python Code, which is run in Jupyter Notebook :-
I think the issue here, is in some of the Lines of Code, 'display' should be 'F' and or vice versa.
Could someone tell me where I need to make those changes, the aim is to get a DataFrame Output, like I aimed to get in my previous str.endswith Thread, which is in this Forum section, posted a few days ago.
Any help would be much appreciated.
Best Regards
Eddie Winch
I have the following Python Code, which is run in Jupyter Notebook :-
import pandas as pd import requests from bs4 import BeautifulSoup import numpy as np import datetime as dt class work: def __init__(self,link): self.link=link self.res=requests.get(self.link) self.soup=BeautifulSoup(self.res.content, "lxml") self.table = self.soup.find_all('table')[0] self.l = pd.read_html(str(self.table)) def create(self): self.ll=[] for i in range(0,6): l1=self.l[1][0:1][i] l1=list(l1) self.ll.extend(l1) l2=self.l[1][2:] self.date=list(l2[0]) self.location=list(l2[1]) self.lancaster=list(l2[2]) self.spitfire=list(l2[3]) self.hurricane=list(l2[4]) self.dakota=list(l2[5]) def month(self): mm=self.l[1][1][1] if mm=='May': x=5 elif mm=='June': x=6 elif mm=='July': x=7 elif mm=='August': x=8 elif mm=='September': x=9 else: x=0 return x def refine(self): self.create() arr=np.asarray(self.date) temp=arr[0] for i in range(0,len(arr)): if arr[i]=='nan': arr[i]=temp else: temp=arr[i] self.y=list(arr) return self.y def convert(self): lx=[] x=self.refine() y=self.month() for i in range(0,len(x)): lx.append((dt.datetime(2006, y, int(x[i]))).strftime('%d-%b-%Y')) return lx def post(self): date=self.convert() dff = pd.DataFrame(list(zip(date,self.location,self.lancaster,self.spitfire,self.hurricane,self.dakota)), columns =self.ll) return dff #a=work('http://web.archive.org/web/20050726230748/http://www.raf.mod.uk/bbmf/may05.html') #b=work('http://web.archive.org/web/20050726230748/http://www.raf.mod.uk/bbmf/june05.html') #c=work('http://web.archive.org/web/20050726230748/http://www.raf.mod.uk/bbmf/july05.html') #d=work('http://web.archive.org/web/20050726230748/http://www.raf.mod.uk/bbmf/august05.html') #e=work('http://web.archive.org/web/20050726230748/http://www.raf.mod.uk/bbmf/september05.html') a=work('http://web.archive.org/web/20060811232523/http://www.deltaweb.co.uk/bbmf/may06.html') b=work('http://web.archive.org/web/20060811232523/http://www.deltaweb.co.uk/bbmf/june06.html') c=work('http://web.archive.org/web/20060811232523/http://www.deltaweb.co.uk/bbmf/july06.html') d=work('http://web.archive.org/web/20060811232523/http://www.deltaweb.co.uk/bbmf/august06.html') e=work('http://web.archive.org/web/20060811232523/http://www.deltaweb.co.uk/bbmf/september06.html') #a=work('http://web.archive.org/web/20070701133815/http://www.bbmf.co.uk/may07.html') #b=work('http://web.archive.org/web/20070701133815/http://www.bbmf.co.uk/june07.html') #c=work('http://web.archive.org/web/20070701133815/http://www.bbmf.co.uk/july07.html') #d=work('http://web.archive.org/web/20070701133815/http://www.bbmf.co.uk/august07.html') #e=work('http://web.archive.org/web/20070701133815/http://www.bbmf.co.uk/september07.html') #a=work('http://web.archive.org/web/20081116021904/http://www.bbmf.co.uk/may08.html') #b=work('http://web.archive.org/web/20081116021904/http://www.bbmf.co.uk/june08.html') #c=work('http://web.archive.org/web/20081116021904/http://www.bbmf.co.uk/july08.html') #d=work('http://web.archive.org/web/20081116021904/http://www.bbmf.co.uk/august08.html') #e=work('http://web.archive.org/web/20081116021904/http://www.bbmf.co.uk/september08.html') dff1=a.post() dff2=b.post() dff3=c.post() dff4=d.post() dff5=e.post() X = pd.concat([dff1, dff2], axis=0) Y = pd.concat([X, dff3], axis=0) Z = pd.concat([Y, dff4], axis=0) F = pd.concat([Z, dff5], axis=0) F=pd.DataFrame(F) #display = F[(F['Location'].str.contains('[a-zA-Z]')) & (F['Dakota'].str.contains('D')) & (F['Spitfire'].str.contains('S', na=True)) & (F['Lancaster'] != 'L')] #display = F[(F['Location'].str.contains('[a-zA-Z]')) & (F['Date'].str.contains('Jul')) & (F['Dakota'].str.contains('D')) & (F['Spitfire'].str.contains('S', na=True)) & (F['Lancaster'] != 'L')] #Use the above Line of Code when filtering DataFrame by Month #Months = May Jun Jul Aug Sep #('Jun')) For Multiple Months use ('Jun|Jul')) For example #Months = -05- -06- -07- -08- -09- display = F[(F['Location'].str.contains('[a-zA-Z]')) & (F['Date'].str.contains('10$|20$')) & (F['Dakota'].str.contains('D')) & (F['Spitfire'].str.contains('S', na=True)) & (F['Lancaster'] != 'L')] #df3['DATE'].str.contains('-6$')) or ('-6$|-8$')) For more than one Day. Use minus sign in front of the number when filtering the DataFrame by Days of Month. #('-6$')) #('-6$|-8$')) For example pd.options.display.max_rows = 1000 pd.options.display.max_columns = 1000 display.drop('Lancaster', axis=1, inplace=True) display=display.dropna(subset=['Spitfire', 'Hurricane'], how='all') #display=display[['Date','Location','Dakota','Hurricane','Spitfire']] display=display[['Location','Date','Dakota','Hurricane','Spitfire']] display=display.fillna('--') #display.reset_index(drop=True, inplace=True) display.to_csv(r'C:\Users\Edward\Desktop\BBMF Schedules And Master Forum Thread Texts\BBMF-2006-Code (Dakota With Fighters).csv') display['Date'] = pd.to_datetime(display['Date']) display = display.sort_values(by='Date', key=lambda col: 100 * col.dt.day + col.dt.month) display['Date']= pd.to_datetime(display['Date']).dt.strftime('%d-%b-%Y') display.reset_index(drop=True, inplace=True) displayWhich I have adapted from another of my Codes, but when I run the Code, only the Columns and no Rows are shown in the DataFrame output.
I think the issue here, is in some of the Lines of Code, 'display' should be 'F' and or vice versa.
Could someone tell me where I need to make those changes, the aim is to get a DataFrame Output, like I aimed to get in my previous str.endswith Thread, which is in this Forum section, posted a few days ago.
Any help would be much appreciated.
Best Regards
Eddie Winch