Python Forum

Full Version: Pandas df.read_html dropping duplicate tables in html : Urgent Guidance
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
my question here

import pandas as pd
tables = pd.read_html('D:\\myhtml.html', header=0)
print (tables)
I have a html file locally with multiple tables. Sometimes the content of the table is exactly the same including headers. While reading it through pandas I have noticed that when two tables are exactly identical it will drop the second one as if it is not there. When I change one <td>Value</td> value in the second table it will read the second table also and display.

How can i stop pandas doing that and read every table

Attaching the exact HTML file. If you see there are 4 tables. Still I get only 3 tables values. The big 2 two tables have exact same data and it is publishing only first one

h t t p s : / / drive.google.com/file/d/0B5HhBthFvDrtMWNtRDBhS1lQcGM/view?usp=sharing
Anybody has any clue about the above. I am still struggling with the simple stupid thing. Did i hit a bug in in Pandas :)