Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Pandas reading html returning NaT
#2
I don't think there is a magic command in pandas that will find a html table in a page which has a lot of other stuff loading.
I would take out html table an then give it to pandas.
Eg.
If use Juypter notebook you get a good looking dataframe like this.
import requests
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://simple.wikipedia.org/wiki/List_of_U.S._states'
url_get = requests.get(url)
soup = BeautifulSoup(url_get.content, 'html.parser')
table = soup.find('div', id="mw-content-text")
table = table.find('table')
with open('table.html', 'w', encoding='utf-8') as f:
    f.write(str(table))

states = pd.read_html('table.html', header=0)
print(states[0][:5])
Output:
  Abbreviation  State Name      Capital     Became a State 0           AL     Alabama   Montgomery  December 14, 1819 1           AK      Alaska       Juneau    January 3, 1959 2           AZ     Arizona      Phoenix  February 14, 1912 3           AR    Arkansas  Little Rock      June 15, 1836 4           CA  California   Sacramento  September 9, 1850
Reply


Messages In This Thread
Pandas reading html returning NaT - by iFunKtion - Nov-09-2016, 02:42 PM
RE: Pandas reading html returning NaT - by snippsat - Nov-09-2016, 05:31 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
  HTML multi select HTML listbox with Flask/Python rfeyer 0 4,536 Mar-14-2021, 12:23 PM
Last Post: rfeyer
  Pandas tuple list returning html string shansaran 0 1,668 Mar-23-2020, 08:44 PM
Last Post: shansaran
  Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row BrandonKastning 0 2,329 Mar-22-2020, 06:10 AM
Last Post: BrandonKastning
  Reading a html file peterl 4 4,494 Aug-20-2018, 03:16 PM
Last Post: peterl
  How to use BeautifulSoup4 with pandas series type of html data? PrateekG 4 4,855 Apr-26-2018, 07:33 AM
Last Post: PrateekG

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020