Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Pandas reading html returning NaT
#1
Hi there,

I am trying to get Pandas to read a wikipedia page that contains a table of US state abreviations, however the actual column in the table that I want is returned as NaT. I understand that this is a pandas variation of NaN meaning that the data is unavailable. The only thing is it is available, it's right there in front of me. Is there a way to get pandas to read this column, it reads almost every other column on the table be it a string or a date, I can't work out why a 2 character string is harder to read than a regular string.

The code I am using to get this is:
states = pd.read_html('https://simple.wikipedia.org/wiki/List_of_U.S._states')

print(states)
Kind regards
iFunc
Reply
#2
I don't think there is a magic command in pandas that will find a html table in a page which has a lot of other stuff loading.
I would take out html table an then give it to pandas.
Eg.
If use Juypter notebook you get a good looking dataframe like this.
import requests
from bs4 import BeautifulSoup
import pandas as pd

url = 'https://simple.wikipedia.org/wiki/List_of_U.S._states'
url_get = requests.get(url)
soup = BeautifulSoup(url_get.content, 'html.parser')
table = soup.find('div', id="mw-content-text")
table = table.find('table')
with open('table.html', 'w', encoding='utf-8') as f:
    f.write(str(table))

states = pd.read_html('table.html', header=0)
print(states[0][:5])
Output:
  Abbreviation  State Name      Capital     Became a State 0           AL     Alabama   Montgomery  December 14, 1819 1           AK      Alaska       Juneau    January 3, 1959 2           AZ     Arizona      Phoenix  February 14, 1912 3           AR    Arkansas  Little Rock      June 15, 1836 4           CA  California   Sacramento  September 9, 1850
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  HTML multi select HTML listbox with Flask/Python rfeyer 0 4,536 Mar-14-2021, 12:23 PM
Last Post: rfeyer
  Pandas tuple list returning html string shansaran 0 1,667 Mar-23-2020, 08:44 PM
Last Post: shansaran
  Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row BrandonKastning 0 2,329 Mar-22-2020, 06:10 AM
Last Post: BrandonKastning
  Reading a html file peterl 4 4,494 Aug-20-2018, 03:16 PM
Last Post: peterl
  How to use BeautifulSoup4 with pandas series type of html data? PrateekG 4 4,855 Apr-26-2018, 07:33 AM
Last Post: PrateekG

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020