Have to be careful when downloading HTML to disk,and then try to read it back again so encoding don't get mess up.
Should always use Requests,then get correct encoding back.
Example:
I could of course also read the url,without saving to disk.
Should always use Requests,then get correct encoding back.
Example:
>>> import requests >>> >>> response = requests.get('https://www.contextures.com/xlSampleData01.html') >>> response.status_code 200 >>> response.encoding 'ISO-8859-1'To disk:
import requests response = requests.get('https://www.contextures.com/xlSampleData01.html') html = response.text with open('html_raw.html', 'w', encoding='ISO-8859-1') as f_out: f_out.write(html)Read saved data pandas:
I could of course also read the url,without saving to disk.
df = pd.read_html('http://www.contextures.com/xlSampleData01.html', header=0)