html_table_parser_python3 KeyError odd behavior - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: General Coding Help (https://python-forum.io/forum-8.html) +--- Thread: html_table_parser_python3 KeyError odd behavior (/thread-39778.html) |
html_table_parser_python3 KeyError odd behavior - idratherbecoding - Apr-13-2023 Hello, I am new to Python, but I have been hobby coding off and on for many years. I am working on a project to scrape sports data (NFL) and I am running into an issue while using the package html_table_parser_python3. The table on the page I am scraping has 33 rows (according to the shape[0] of my pandas DataFrame object). I am trying to access information on the 30th row (index 29), but I am getting a KeyError thrown saying 29 is not in range. I have provided the relevant code snippet below along with the error message. I am not sure if the issue is with html_parser, pandas, or something else. I appreciate any help. Thanks. def get_rb_data(home_rbs, away_rbs, home, away): rb_html = get_table_from_url('https://www.teamrankings.com/nfl/stat/rushing-attempts-per-game').decode('utf-8') parsed_rb_html = HTMLTableParser() parsed_rb_html.feed(rb_html) rb_data_frame = pd.DataFrame(parsed_rb_html.tables[0]) home_rushes = 0 away_rushes = 0 print(rb_data_frame.shape[0]) #This is for troubleshooting to see how many rows are in the table. Prints 33. for x in range(rb_data_frame.shape[0]): if rb_data_frame.loc[x][1] == home: home_rushes = float(rb_data_frame[x][2]) * 17 for x in range(rb_data_frame.shape[0]): if rb_data_frame.loc[x][1] == away: away_rushes = float(rb_data_frame[x][2]) * 17 if __name__ == '__main__': texans_rbs = ['Dameon Pierce', 'Rex Burkhead'] broncos_rbs = ['Latavius Murray', 'Chase Edmonds'] #Error is thrown on this line. See below for full traceback. texans_rb_attributes, broncos_rb_attributes = get_rb_data(texans_rbs, broncos_rbs, 'Houston', 'Denver')
RE: html_table_parser_python3 KeyError odd behavior - idratherbecoding - Apr-13-2023 I’ve been doing some searching on my own since I posted. I think the problem is that I was not indexing properly and using loc when I should be using iloc. Instead of df.iloc[row, col], which appears to be what I am trying to do, I was using df.loc[row][col], which uses labels rather than integer indices, hence the error. I will have to wait until I get home from work to verify this solves my problem, but if anyone wants to confirm for me before then, that would be appreciated. Thanks. RE: html_table_parser_python3 KeyError odd behavior - snippsat - Apr-13-2023 See some problem here,to give some tips. Pandas can parse table import pandas as pd df = pd.read_html('https://www.teamrankings.com/nfl/stat/rushing-attempts-per-game')[0] >>> df.head() Rank Team 2022 Last 3 Last 1 Home Away 2021 0 1 Philadelphia 33.2 40.0 32.0 33.9 32.3 31.5 1 2 Atlanta 32.9 34.0 35.0 35.0 30.5 23.1 2 3 Chicago 32.8 24.3 22.0 32.8 32.9 27.9 3 4 Washington 31.6 37.0 41.0 30.7 32.8 28.1 4 5 Cleveland 31.3 28.7 22.0 33.6 29.2 28.5 # Always look types >>> df.dtypes Rank int64 Team object 2022 float64 Last 3 float64 Last 1 float64 Home float64 Away float64 2021 float64 dtype: objectSo the table and types look ok. When you write regular Python loop like you do in Pandas,it almost guarantee to be wrong approach to do it in Pandas.To give a similar example of what you try to do,let say if Last 1 has values over 30 we multiple Home bye 17.import pandas as pd df = pd.read_html('https://www.teamrankings.com/nfl/stat/rushing-attempts-per-game')[0] mask = df["Last 1"] > 30 df.loc[mask, "Home"] = df.loc[mask, "Home"] * 17 >>> df.head(8) Rank Team 2022 Last 3 Last 1 Home Away 2021 0 1 Philadelphia 33.2 40.0 32.0 576.3 32.3 31.5 1 2 Atlanta 32.9 34.0 35.0 595.0 30.5 23.1 2 3 Chicago 32.8 24.3 22.0 32.8 32.9 27.9 3 4 Washington 31.6 37.0 41.0 521.9 32.8 28.1 4 5 Cleveland 31.3 28.7 22.0 33.6 29.2 28.5 5 6 Baltimore 31.2 30.0 35.0 532.1 31.1 30.4 6 7 Dallas 30.9 28.0 22.0 30.0 31.8 27.4 7 8 NY Giants 30.0 23.7 20.0 33.0 27.3 24.6As you see no loop,work with built-in on whole DataFrame,this is also a lot faster appcorch. RE: html_table_parser_python3 KeyError odd behavior - idratherbecoding - Apr-14-2023 Thank you so much for taking the time write such a detailed response. This really helped me clean up my code quite a bit. You are right, that is much easier and faster. Thanks! |