I have this piece of code to extract a table from a Wikipedia article.
import urllib.request
url = "https://en.wikipedia.org/wiki/List_of_London_Underground_stations"
page = urllib.request.urlopen(url)
from bs4 import BeautifulSoup
soup = BeautifulSoup(page, "lxml")
all_tables=soup.find_all("table")
all_tables
right_table=soup.find('table', class_='wikitable sortable plainrowheaders jquery-tablesorter')
print(right_table)
After print I get 'None' as output. What am I doing wrong?
you can use:
import requests
from bs4 import BeautifulSoup
def get_stops():
url = 'https://en.wikipedia.org/wiki/List_of_London_Underground_stations'
response = requests.get(url)
if response.status_code == 200:
page = response.content
soup = BeautifulSoup(page, 'lxml')
right_table = soup.select('table.wikitable:nth-child(11)')[0]
print(right_table)
if __name__ == '__main__':
get_stops()
partial output:
Output:
<table class="wikitable sortable plainrowheaders" style="width: 100%; text-align: center;">
<tbody><tr>
<th scope="col">Station
</th>
<th class="unsortable" scope="col">Photograph
</th>
<th class="sortable" scope="col">Line(s)<sup class="reference" id="ref_note01^"><a href="#endnote_note01^">[*]</a></sup>
</th>
<th scope="col">Local authority
</th>
<th scope="col">Zone(s)<sup class="reference" id="ref_note02^"><a href="#endnote_note02^">[†]</a></sup>
...
I am usually lazy and go for quick and dirty solution and figure out solutions to problems arising from that (if any) later.
So I would use pandas.
>>> import pandas as pd
>>> df = pd.read_html('https://en.wikipedia.org/wiki/List_of_London_Underground_stations')[0].drop(columns='Photograph')
>>> df
Station Line(s)[*] ... Other name(s)[note 2] Usage[5]
0 Acton Town DistrictPiccadilly ... Mill Hill Park: 1879–1910 6.04
1 Aldgate Metropolitan[a]Circle ... NaN 8.85
2 Aldgate East Hammersmith & City[d]District ... Commercial Road: Proposed before opening 14.00
3 Alperton Piccadilly[h] ... Perivale-Alperton: 1903–10 3.05
4 Amersham Metropolitan ... Amersham: 1892–1922Amersham & Chesham Bois: 19... 2.32
.. ... ... ... ... ...
265 Wimbledon Park District ... NaN 2.18
266 Wood Green Piccadilly ... Lordship Lane: Proposed before opening 12.89
267 Wood Lane Hammersmith & CityCircle ... NaN 4.00
268 Woodford Central ... NaN 5.98
269 Woodside Park Northern ... Torrington Park, Woodside: 1872–82Woodside Par... 3.54
[270 rows x 8 columns]