(Jan-13-2022, 04:56 AM)BrandonKastning Wrote: Thank you again for this forum! How do I determine the tables[#]? Is it a guessing game or is is there an attribute or property within the browser code that could aid me in finding the correct tables[#]?A web site can have many tables,so have to look at site(count) or test out like tables[0], tables[1],tables[6].... and see if get wanted result.
There is
match
in pandas.read_html that can use string or regex to match something i table wanted.Example Timeline of programming languages ,let say we want Python table we can match name Guido van Rossum.
import pandas as pd df = pd.read_html('https://en.wikipedia.org/wiki/Timeline_of_programming_languages', match='Guido van Rossum') df[0].head(13)
Output: Year Name Chief developer, company Predecessor(s)
0 1990 Sather Steve Omohundro Eiffel
1 1990 AMOS BASIC François Lionet and Constantin Sotiropoulos STOS BASIC
2 1990 AMPL Robert Fourer, David Gay and Brian Kernighan a... NaN
3 1990 Object Oberon H Mössenböck, J Templ, R Griesemer Oberon
4 1990 J Kenneth E. Iverson, Roger Hui at Iverson Software APL, FP
5 1990 Haskell NaN Miranda
6 1990 EuLisp NaN Common Lisp, Scheme
7 1990 Z Shell (zsh) Paul Falstad at Princeton University ksh
8 1991 GNU E David J. DeWitt, Michael J. Carey C++
9 1991 Oberon-2 Hanspeter Mössenböck, Wirth Object Oberon
10 1991 Oz Gert Smolka and his students Prolog
11 1991 Q Albert Gräf NaN
12 1991 Python Guido van Rossum ABC, C
So if a match it will always be df[0]
.Without
match
it would be table 9:df[9].head(13)