fetching, parsing data from Wikipedia

apollo · (This post was last modified: May-06-2021, 05:32 PM by apollo.)

hello dear python-experts, good day. Smile

this scraper fetches wikipedia pages

it is a nice little scraper - it ...:

import requests
import urllib.request
import time
from bs4 import BeautifulSoup
import numpy as np
import pandas as pd
from urllib.request import urlopen
url = 'https://en.wikipedia.org/wiki/List_of_cities_by_sunshine_duration'
html = urlopen(url) 
soup = BeautifulSoup(html, 'html.parser')

these few lines fetch data . but i guess that i need more.

i am going to add some

find_all('table') ## that helps me to scan the entire document to look for the following tag <table>

and the following

tables = soup.find_all('table')

can i do this like so?

apollo · May-06-2021, 07:29 AM

srry - BROY

but this makes no sense to me. it has nothing to do with the question

i guess this is a form of spam.

***snippsat*** · (This post was last modified: May-06-2021, 08:09 PM by snippsat.)

(May-05-2021, 06:12 PM)apollo Wrote: and the following
tables = soup.find_all('table')
can i do this like so?

Yes,but need to find right table here a coupled of way,and a better way with Pandas.
Here a Notebook se that get table and can start to work with right away as it's now a DataFrame.

Here a standard way.
As you see get table but still need a lot work to get data if want do something useful with it.
Do not use urllib.

import requests
from bs4 import BeautifulSoup

url = 'https://en.wikipedia.org/wiki/List_of_cities_by_sunshine_duration'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
print(soup.find('title').text)
print(soup.select_one('#mw-content-text > div.mw-parser-output > table:nth-child(9)'))

Output:List of cities by sunshine duration - Wikipedia
<table class="wikitable plainrowheaders sortable" style="text-align:right;">
<caption>Sunshine hours for selected cities in Africa
</caption>
<tbody><tr style="vertical-align:top">
<th>Country
</th>
<th>City
</th>
<th>Jan
</th>
<th>Feb
</th>
<th>Mar
</th>
<th>Apr
</th>
<th>May
</th>
<th>Jun
</th>
<th>Jul
</th>
<th>Aug
</th>
<th>Sep
</th>
<th>Oct
</th>
<th>Nov
</th>
<th>Dec
</th>
<th>Year
</th>
<th>Ref.
</th></tr>
<tr>
<td style="text-align:left;"><a href="/wiki/Ivory_Coast" title="Ivory Coast">Ivory Coast</a>
</td>
<td style="text-align:left;"><a href="/wiki/Gagnoa" title="Gagnoa">Gagnoa</a>
</td>
<td style="background: #D5D500; color:#000000;;">183.0
</td>
<td style="background: #D4D400; color:#000000;;">180.0
</td>
<td style="background: #D8D800; color:#000000;;">196.0
</td>
.....ect

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Fetching Images from DB in Django	Dexty	2	2,615	Mar-15-2024, 08:43 AM Last Post: firn100
	Scraping Wikipedia Article (Name in 1 column & URL in 2nd column) ->CSV! Anyone?	BrandonKastning	4	3,013	Jan-27-2022, 04:36 AM Last Post: Larz60+
	Logic behind BeautifulSoup data-parsing	jimsxxl	7	5,943	Apr-13-2021, 09:06 AM Last Post: jimsxxl
	Need help scraping wikipedia table	bborusz2	6	4,752	Dec-01-2020, 11:31 PM Last Post: snippsat
	table from wikipedia	flow50	5	6,886	Jul-01-2019, 07:12 PM Last Post: snippsat
	Fetching and Parsing XML Data	FalseFact	3	4,269	Apr-01-2019, 10:21 AM Last Post: Larz60+
	how to make my product description fetching function generic?	PrateekG	10	8,021	Jun-29-2018, 01:03 PM Last Post: PrateekG
	Getting 'list index out of range' while fetching product details using BeautifulSoup?	PrateekG	8	10,057	Jun-06-2018, 12:15 PM Last Post: snippsat

fetching, parsing data from Wikipedia

User Panel Messages

Announcements