Python Forum

Full Version: Imprt HTML table to array
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
I have this piece of code to extract a table from a Wikipedia article.

import urllib.request
url = "https://en.wikipedia.org/wiki/List_of_London_Underground_stations"
page = urllib.request.urlopen(url)

from bs4 import BeautifulSoup
soup = BeautifulSoup(page, "lxml")

all_tables=soup.find_all("table")
all_tables

right_table=soup.find('table', class_='wikitable sortable plainrowheaders jquery-tablesorter')
print(right_table)
After print I get 'None' as output. What am I doing wrong?
you can use:
import requests
from bs4 import BeautifulSoup


def get_stops():
    url = 'https://en.wikipedia.org/wiki/List_of_London_Underground_stations'
    response = requests.get(url)
    if response.status_code == 200:
        page = response.content
    soup = BeautifulSoup(page, 'lxml')
    right_table = soup.select('table.wikitable:nth-child(11)')[0]
    print(right_table)

if __name__ == '__main__':
    get_stops()
partial output:
Output:
<table class="wikitable sortable plainrowheaders" style="width: 100%; text-align: center;"> <tbody><tr> <th scope="col">Station </th> <th class="unsortable" scope="col">Photograph </th> <th class="sortable" scope="col">Line(s)<sup class="reference" id="ref_note01^"><a href="#endnote_note01^">[*]</a></sup> </th> <th scope="col">Local authority </th> <th scope="col">Zone(s)<sup class="reference" id="ref_note02^"><a href="#endnote_note02^">[†]</a></sup> ...
I am usually lazy and go for quick and dirty solution and figure out solutions to problems arising from that (if any) later.

So I would use pandas.

>>> import pandas as pd
>>> df = pd.read_html('https://en.wikipedia.org/wiki/List_of_London_Underground_stations')[0].drop(columns='Photograph')
>>> df
            Station                     Line(s)[*]  ...                              Other name(s)[note 2] Usage[5]
0        Acton Town             DistrictPiccadilly  ...                          Mill Hill Park: 1879–1910     6.04
1           Aldgate          Metropolitan[a]Circle  ...                                                NaN     8.85
2      Aldgate East  Hammersmith & City[d]District  ...           Commercial Road: Proposed before opening    14.00
3          Alperton                  Piccadilly[h]  ...                         Perivale-Alperton: 1903–10     3.05
4          Amersham                   Metropolitan  ...  Amersham: 1892–1922Amersham & Chesham Bois: 19...     2.32
..              ...                            ...  ...                                                ...      ...
265  Wimbledon Park                       District  ...                                                NaN     2.18
266      Wood Green                     Piccadilly  ...             Lordship Lane: Proposed before opening    12.89
267       Wood Lane       Hammersmith & CityCircle  ...                                                NaN     4.00
268        Woodford                        Central  ...                                                NaN     5.98
269   Woodside Park                       Northern  ...  Torrington Park, Woodside: 1872–82Woodside Par...     3.54

[270 rows x 8 columns]