Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Pandas reading html returning NaT
Hi there,

I am trying to get Pandas to read a wikipedia page that contains a table of US state abreviations, however the actual column in the table that I want is returned as NaT. I understand that this is a pandas variation of NaN meaning that the data is unavailable. The only thing is it is available, it's right there in front of me. Is there a way to get pandas to read this column, it reads almost every other column on the table be it a string or a date, I can't work out why a 2 character string is harder to read than a regular string.

The code I am using to get this is:
states = pd.read_html('')

Kind regards
I don't think there is a magic command in pandas that will find a html table in a page which has a lot of other stuff loading.
I would take out html table an then give it to pandas.
If use Juypter notebook you get a good looking dataframe like this.
import requests
from bs4 import BeautifulSoup
import pandas as pd

url = ''
url_get = requests.get(url)
soup = BeautifulSoup(url_get.content, 'html.parser')
table = soup.find('div', id="mw-content-text")
table = table.find('table')
with open('table.html', 'w', encoding='utf-8') as f:

states = pd.read_html('table.html', header=0)
  Abbreviation  State Name      Capital     Became a State 0           AL     Alabama   Montgomery  December 14, 1819 1           AK      Alaska       Juneau    January 3, 1959 2           AZ     Arizona      Phoenix  February 14, 1912 3           AR    Arkansas  Little Rock      June 15, 1836 4           CA  California   Sacramento  September 9, 1850

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Reading a html file peterl 4 981 Aug-20-2018, 03:16 PM
Last Post: peterl
  How to use BeautifulSoup4 with pandas series type of html data? PrateekG 4 1,464 Apr-26-2018, 07:33 AM
Last Post: PrateekG

Forum Jump:

Users browsing this thread: 1 Guest(s)