Python Forum
Pandas reading html returning NaT - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: Pandas reading html returning NaT (/thread-852.html)



Pandas reading html returning NaT - iFunKtion - Nov-09-2016

Hi there,

I am trying to get Pandas to read a wikipedia page that contains a table of US state abreviations, however the actual column in the table that I want is returned as NaT. I understand that this is a pandas variation of NaN meaning that the data is unavailable. The only thing is it is available, it's right there in front of me. Is there a way to get pandas to read this column, it reads almost every other column on the table be it a string or a date, I can't work out why a 2 character string is harder to read than a regular string.

The code I am using to get this is:
states = pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states')

print(states)
Kind regards
iFunc


RE: Pandas reading html returning NaT - snippsat - Nov-09-2016

I don't think there is a magic command in pandas that will find a html table in a page which has a lot of other stuff loading.
I would take out html table an then give it to pandas.
Eg.
If use Juypter notebook you get a good looking dataframe like this.
import requests
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://simple.wikipedia.org/wiki/List_of_U.S._states'
url_get = requests.get(url)
soup = BeautifulSoup(url_get.content, 'html.parser')
table = soup.find('div', id="mw-content-text")
table = table.find('table')
with open('table.html', 'w', encoding='utf-8') as f:
    f.write(str(table))

states = pd.read_html('table.html', header=0)
print(states[0][:5])
Output:
  Abbreviation  State Name      Capital     Became a State 0           AL     Alabama   Montgomery  December 14, 1819 1           AK      Alaska       Juneau    January 3, 1959 2           AZ     Arizona      Phoenix  February 14, 1912 3           AR    Arkansas  Little Rock      June 15, 1836 4           CA  California   Sacramento  September 9, 1850