Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Imprt HTML table to array
#1
I have this piece of code to extract a table from a Wikipedia article.

import urllib.request
url = "https://en.wikipedia.org/wiki/List_of_London_Underground_stations"
page = urllib.request.urlopen(url)

from bs4 import BeautifulSoup
soup = BeautifulSoup(page, "lxml")

all_tables=soup.find_all("table")
all_tables

right_table=soup.find('table', class_='wikitable sortable plainrowheaders jquery-tablesorter')
print(right_table)
After print I get 'None' as output. What am I doing wrong?
Reply
#2
you can use:
import requests
from bs4 import BeautifulSoup


def get_stops():
    url = 'https://en.wikipedia.org/wiki/List_of_London_Underground_stations'
    response = requests.get(url)
    if response.status_code == 200:
        page = response.content
    soup = BeautifulSoup(page, 'lxml')
    right_table = soup.select('table.wikitable:nth-child(11)')[0]
    print(right_table)

if __name__ == '__main__':
    get_stops()
partial output:
Output:
<table class="wikitable sortable plainrowheaders" style="width: 100%; text-align: center;"> <tbody><tr> <th scope="col">Station </th> <th class="unsortable" scope="col">Photograph </th> <th class="sortable" scope="col">Line(s)<sup class="reference" id="ref_note01^"><a href="#endnote_note01^">[*]</a></sup> </th> <th scope="col">Local authority </th> <th scope="col">Zone(s)<sup class="reference" id="ref_note02^"><a href="#endnote_note02^">[†]</a></sup> ...
Reply
#3
I am usually lazy and go for quick and dirty solution and figure out solutions to problems arising from that (if any) later.

So I would use pandas.

>>> import pandas as pd
>>> df = pd.read_html('https://en.wikipedia.org/wiki/List_of_London_Underground_stations')[0].drop(columns='Photograph')
>>> df
            Station                     Line(s)[*]  ...                              Other name(s)[note 2] Usage[5]
0        Acton Town             DistrictPiccadilly  ...                          Mill Hill Park: 1879–1910     6.04
1           Aldgate          Metropolitan[a]Circle  ...                                                NaN     8.85
2      Aldgate East  Hammersmith & City[d]District  ...           Commercial Road: Proposed before opening    14.00
3          Alperton                  Piccadilly[h]  ...                         Perivale-Alperton: 1903–10     3.05
4          Amersham                   Metropolitan  ...  Amersham: 1892–1922Amersham & Chesham Bois: 19...     2.32
..              ...                            ...  ...                                                ...      ...
265  Wimbledon Park                       District  ...                                                NaN     2.18
266      Wood Green                     Piccadilly  ...             Lordship Lane: Proposed before opening    12.89
267       Wood Lane       Hammersmith & CityCircle  ...                                                NaN     4.00
268        Woodford                        Central  ...                                                NaN     5.98
269   Woodside Park                       Northern  ...  Torrington Park, Woodside: 1872–82Woodside Par...     3.54

[270 rows x 8 columns] 
I'm not 'in'-sane. Indeed, I am so far 'out' of sane that you appear a tiny blip on the distant coast of sanity. Bucky Katt, Get Fuzzy

Da Bishop: There's a dead bishop on the landing. I don't know who keeps bringing them in here. ....but society is to blame.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Suggestion request for scrapping html table Vkkindia 3 2,036 Dec-06-2021, 06:09 PM
Last Post: Larz60+
  HTML multi select HTML listbox with Flask/Python rfeyer 0 4,644 Mar-14-2021, 12:23 PM
Last Post: rfeyer
  Help: Beautiful Soup - Parsing HTML table ironfelix717 2 2,695 Oct-01-2020, 02:19 PM
Last Post: snippsat
  Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row BrandonKastning 0 2,370 Mar-22-2020, 06:10 AM
Last Post: BrandonKastning
  BeautifulSoup: Error while extracting a value from an HTML table kawasso 3 3,227 Aug-25-2019, 01:13 AM
Last Post: kawasso
  convert html table to json bhojendra 5 16,029 Jul-30-2019, 07:53 PM
Last Post: DeaD_EyE
  How to capture Single Column from Web Html Table? ahmedwaqas92 5 4,386 Jul-29-2019, 02:17 AM
Last Post: ahmedwaqas92

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020