I don't think there is a magic command in pandas that will find a html table in a page which has a lot of other stuff loading.
I would take out html table an then give it to pandas.
Eg.
If use Juypter notebook you get a good looking dataframe like this.
I would take out html table an then give it to pandas.
Eg.
If use Juypter notebook you get a good looking dataframe like this.
import requests from bs4 import BeautifulSoup import pandas as pd url = 'https://simple.wikipedia.org/wiki/List_of_U.S._states' url_get = requests.get(url) soup = BeautifulSoup(url_get.content, 'html.parser') table = soup.find('div', id="mw-content-text") table = table.find('table') with open('table.html', 'w', encoding='utf-8') as f: f.write(str(table)) states = pd.read_html('table.html', header=0) print(states[0][:5])
Output: Abbreviation State Name Capital Became a State
0 AL Alabama Montgomery December 14, 1819
1 AK Alaska Juneau January 3, 1959
2 AZ Arizona Phoenix February 14, 1912
3 AR Arkansas Little Rock June 15, 1836
4 CA California Sacramento September 9, 1850