Jan-01-2019, 11:04 PM
This script reads a text file containing a HTML table, uses pandas to parse that table into a dataframe. Then writes that dataframe as a CSV to another text file.
Ive tried
import pandas as pd f=open('input_table.html','r') html = f.read() df = pd.read_html(html) df[0].to_csv('output.csv',index=False,header=False)Problem is some of the cells of the table randomly contain URL encoded HTML. So for example <h1>My Heading</h1> will be %3Ch1%3EMy%20Heading%3C%2Fh1%3E
Ive tried
import pandas as pd import urllib f=open('input_table.html','r') html = f.read() html=urllib.unquote(html).decode('utf8') df = pd.read_html(html) df[0].to_csv('output.csv',index=False,header=False)But that results in scrambled results in the CSV.