Apr-13-2023, 04:20 AM
Hello,
I am new to Python, but I have been hobby coding off and on for many years. I am working on a project to scrape sports data (NFL) and I am running into an issue while using the package html_table_parser_python3. The table on the page I am scraping has 33 rows (according to the shape[0] of my pandas DataFrame object). I am trying to access information on the 30th row (index 29), but I am getting a KeyError thrown saying 29 is not in range. I have provided the relevant code snippet below along with the error message. I am not sure if the issue is with html_parser, pandas, or something else. I appreciate any help. Thanks.
I am new to Python, but I have been hobby coding off and on for many years. I am working on a project to scrape sports data (NFL) and I am running into an issue while using the package html_table_parser_python3. The table on the page I am scraping has 33 rows (according to the shape[0] of my pandas DataFrame object). I am trying to access information on the 30th row (index 29), but I am getting a KeyError thrown saying 29 is not in range. I have provided the relevant code snippet below along with the error message. I am not sure if the issue is with html_parser, pandas, or something else. I appreciate any help. Thanks.
def get_rb_data(home_rbs, away_rbs, home, away): rb_html = get_table_from_url('https://www.teamrankings.com/nfl/stat/rushing-attempts-per-game').decode('utf-8') parsed_rb_html = HTMLTableParser() parsed_rb_html.feed(rb_html) rb_data_frame = pd.DataFrame(parsed_rb_html.tables[0]) home_rushes = 0 away_rushes = 0 print(rb_data_frame.shape[0]) #This is for troubleshooting to see how many rows are in the table. Prints 33. for x in range(rb_data_frame.shape[0]): if rb_data_frame.loc[x][1] == home: home_rushes = float(rb_data_frame[x][2]) * 17 for x in range(rb_data_frame.shape[0]): if rb_data_frame.loc[x][1] == away: away_rushes = float(rb_data_frame[x][2]) * 17 if __name__ == '__main__': texans_rbs = ['Dameon Pierce', 'Rex Burkhead'] broncos_rbs = ['Latavius Murray', 'Chase Edmonds'] #Error is thrown on this line. See below for full traceback. texans_rb_attributes, broncos_rb_attributes = get_rb_data(texans_rbs, broncos_rbs, 'Houston', 'Denver')
Error:Traceback (most recent call last):
File "/Users/aaronlott/PycharmProjects/Scraping/venv/lib/python3.9/site-packages/pandas/core/indexes/range.py", line 345, in get_loc
return self._range.index(new_key)
ValueError: 29 is not in range
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/aaronlott/PycharmProjects/Scraping/main.py", line 326, in <module>
texans_rb_butes, broncos_rb_butes = get_rb_data(texans_rbs, broncos_rbs, 'Houston', 'Denver')
File "/Users/aaronlott/PycharmProjects/Scraping/main.py", line 94, in get_rb_data
home_rushes = float(rb_data_frame[x][2]) * 17
File "/Users/aaronlott/PycharmProjects/Scraping/venv/lib/python3.9/site-packages/pandas/core/frame.py", line 3760, in __getitem__
indexer = self.columns.get_loc(key)
File "/Users/aaronlott/PycharmProjects/Scraping/venv/lib/python3.9/site-packages/pandas/core/indexes/range.py", line 347, in get_loc
raise KeyError(key) from err
KeyError: 29