Feb-09-2020, 05:24 AM
Hi All,
I am trying to adapt a program to scrape data from a cisco phone web page. I found some code at https://srome.github.io/Parsing-HTML-Tab...nd-pandas/
that I felt would get me pretty close to what I needed.
However when I run the code below, I get the errors shown below that. Im thinking that perhaps Im running a different version of python or one of the imports, and the syntax has changed. Im new to python - is this a common problem? or can someone else spot some other error? Im using the same URL used in the demo at the link shown above so I would expect the same results.
I want to get the demo running before I start modifying it much, but havent been able to reach that point.
thanks for your help.
I am trying to adapt a program to scrape data from a cisco phone web page. I found some code at https://srome.github.io/Parsing-HTML-Tab...nd-pandas/
that I felt would get me pretty close to what I needed.
However when I run the code below, I get the errors shown below that. Im thinking that perhaps Im running a different version of python or one of the imports, and the syntax has changed. Im new to python - is this a common problem? or can someone else spot some other error? Im using the same URL used in the demo at the link shown above so I would expect the same results.
I want to get the demo running before I start modifying it much, but havent been able to reach that point.
thanks for your help.
#from html.parser import HTMLParser #from html.entities import name2codepoint import pandas as pd from bs4 import BeautifulSoup import requests url = "https://www.fantasypros.com/nfl/reports/leaders/qb.php?year=2015" # response = requests.get(url2) # response.text[:100] # Access the HTML with the text property # print(response.text[:100]) class HTMLTableParser: def parse_url(self, url): response = requests.get(url) soup = BeautifulSoup(response.text, 'lxml') tables = soup.findAll("table") for table in tables: if table.findParent("table") is None: test = str(table) return [(table['id'],self.parse_html_table(table))\ for table in tables] # for table in soup.find_All('table')] def parse_html_table(self, table): n_columns = 0 n_rows=0 column_names = [] # Find number of rows and columns # we also find the column titles if we can for row in table.find_all('tr'): # Determine the number of rows in the table td_tags = row.find_all('td') if len(td_tags) > 0: n_rows+=1 if n_columns == 0: # Set the number of columns for our table n_columns = len(td_tags) # Handle column names if we find them th_tags = row.find_all('th') if len(th_tags) > 0 and len(column_names) == 0: for th in th_tags: column_names.append(th.get_text()) # Safeguard on Column Titles if len(column_names) > 0 and len(column_names) != n_columns: raise Exception("Column titles do not match the number of columns") columns = column_names if len(column_names) > 0 else range(0,n_columns) df = pd.DataFrame(columns = columns, index= range(0,n_rows)) row_marker = 0 for row in table.find_all('tr'): column_marker = 0 columns = row.find_all('td') for column in columns: df.iat[row_marker,column_marker] = column.get_text() column_marker += 1 if len(columns) > 0: row_marker += 1 # Convert to float if possible for col in df: try: df[col] = df[col].astype(float) except ValueError: pass return df hp = HTMLTableParser() #table = hp.parse_url(url)[0][1] # Grabbing the table from the tuple table = hp.parse_url(url)[0][1] #table = hp.parse_html_table(htmstring) table.head()results in errors:
Error:Traceback (most recent call last):
File "C:\Users\t01136.POS\eclipsePython-workspace\HTMParse\HTMParse.py", line 80, in <module>
table = hp.parse_url(url)[0][1]
File "C:\Users\t01136.POS\eclipsePython-workspace\HTMParse\HTMParse.py", line 25, in parse_url
for table in tables]
File "C:\Users\t01136.POS\eclipsePython-workspace\HTMParse\HTMParse.py", line 25, in <listcomp>
for table in tables]
File "C:\Users\t01136.POS\AppData\Local\Programs\Python\Python36-32\lib\site-packages\bs4\element.py", line 1321, in __getitem__
return self.attrs[key]
KeyError: 'id'
PS, Im running in eclipsePython, with ver Python 3.6.7