Aug-04-2017, 06:01 AM
my question here
I have a html file locally with multiple tables. Sometimes the content of the table is exactly the same including headers. While reading it through pandas I have noticed that when two tables are exactly identical it will drop the second one as if it is not there. When I change one <td>Value</td> value in the second table it will read the second table also and display.
How can i stop pandas doing that and read every table
Attaching the exact HTML file. If you see there are 4 tables. Still I get only 3 tables values. The big 2 two tables have exact same data and it is publishing only first one
h t t p s : / / drive.google.com/file/d/0B5HhBthFvDrtMWNtRDBhS1lQcGM/view?usp=sharing
1 2 3 |
import pandas as pd tables = pd.read_html( 'D:\\myhtml.html' , header = 0 ) print (tables) |
How can i stop pandas doing that and read every table
Attaching the exact HTML file. If you see there are 4 tables. Still I get only 3 tables values. The big 2 two tables have exact same data and it is publishing only first one
h t t p s : / / drive.google.com/file/d/0B5HhBthFvDrtMWNtRDBhS1lQcGM/view?usp=sharing