Jun-27-2019, 10:31 AM
Hi,
I’m a newbie in programming and web scraping.
I got this assignment:
What I’ve done so far:
Thanks in advance for your help!
I’m a newbie in programming and web scraping.
I got this assignment:
Quote:wikipedia web site: link
From the link above transform the table Sovereign states and dependencies by population into pandas dataframe with the next columns (choose the coresponding data type and be careful with the right index !)
• Rank: (Index) - int
• Country name: - object
• Population - int
• Date - Datetime
• % of world population - int
What I’ve done so far:
import numpy as np import pandas as pd import requests from bs4 import BeautifulSoup import requests url_cntr = 'https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population' t = requests.get(url_cntr) t.text
html_content = t.text html_soup = BeautifulSoup(html_content, 'html.parser') html_soup.text
sover = []
len(sover)output is: 0
import requests url_cntr = 'https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population' t = requests.get(url_cntr) t.text html_content = t.text html_soup = BeautifulSoup(html_content, 'html.parser') sover = [] sov_tables = html_soup.find_all('table', class_='jquery-tablesorter')
for table in sov_tables[0]: headers = [] rows = table.find_all('tr') for header in table.find('tr').find_all('th'): headers.append(header.text) for row in rows[1:]: values = [] for col in row.find_all(['th', 'td']): values.append(col.text) if values: cntr_dict = {headers[i]: values[i] for i in range(len(values))} cntr.append(cntr_dict)I got this error:
--------------------------------------------------------------------------- IndexError Traceback (most recent call last) <ipython-input-25-b2259dadb770> in <module> ----> 1 for table in sov_tables[0]: 2 headers = [] 3 rows = table.find_all('tr') 4 for header in table.find('tr').find_all('th'): 5 headers.append(header.text) IndexError: list index out of rangeWhat am I doing wrong?
Thanks in advance for your help!