Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
table from wikipedia
#1
Hi,
I’m a newbie in programming and web scraping.

I got this assignment:

Quote:wikipedia web site: link

From the link above transform the table Sovereign states and dependencies by population into pandas dataframe with the next columns (choose the coresponding data type and be careful with the right index !)

• Rank: (Index) - int
• Country name: - object
• Population - int
• Date - Datetime

• % of world population - int

What I’ve done so far:

import numpy as np
import pandas as pd
import requests
from bs4 import BeautifulSoup

import requests
url_cntr = 'https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population'
t = requests.get(url_cntr)
t.text
html_content = t.text 
html_soup = BeautifulSoup(html_content, 'html.parser')
html_soup.text
sover = []
len(sover)
output is: 0

import requests
url_cntr = 'https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population'
t = requests.get(url_cntr)
t.text
html_content = t.text 
html_soup = BeautifulSoup(html_content, 'html.parser')
sover = []
sov_tables = html_soup.find_all('table', class_='jquery-tablesorter')
for table in sov_tables[0]:
    headers = []
    rows = table.find_all('tr')
    for header in table.find('tr').find_all('th'):
        headers.append(header.text)
    for row in rows[1:]:
        values = []
        for col in row.find_all(['th', 'td']):
            values.append(col.text)
        if values:
            cntr_dict = {headers[i]: values[i] for i in range(len(values))}
            cntr.append(cntr_dict)
I got this error:
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-25-b2259dadb770> in <module>
----> 1 for table in sov_tables[0]:
      2     headers = []
      3     rows = table.find_all('tr')
      4     for header in table.find('tr').find_all('th'):
      5         headers.append(header.text)

IndexError: list index out of range
What am I doing wrong?

Thanks in advance for your help!
Reply


Messages In This Thread
table from wikipedia - by flow50 - Jun-27-2019, 10:31 AM
RE: table from wikipedia - by snippsat - Jun-27-2019, 12:47 PM
RE: table from wikipedia - by flow50 - Jun-28-2019, 03:02 PM
RE: table from wikipedia - by snippsat - Jun-28-2019, 05:22 PM
RE: table from wikipedia - by flow50 - Jul-01-2019, 12:16 PM
RE: table from wikipedia - by snippsat - Jul-01-2019, 07:12 PM

Possibly Related Threads…
Thread Author Replies Views Last Post
Question Scraping Wikipedia Article (Name in 1 column & URL in 2nd column) ->CSV! Anyone? BrandonKastning 4 2,025 Jan-27-2022, 04:36 AM
Last Post: Larz60+
  fetching, parsing data from Wikipedia apollo 2 3,549 May-06-2021, 08:08 PM
Last Post: snippsat
  Need help scraping wikipedia table bborusz2 6 3,246 Dec-01-2020, 11:31 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020