Python Forum
fetching, parsing data from Wikipedia
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
fetching, parsing data from Wikipedia
#1
hello dear python-experts, good day. Smile



this scraper fetches wikipedia pages



it is a nice little scraper - it ...:

import requests
import urllib.request
import time
from bs4 import BeautifulSoup
import numpy as np
import pandas as pd
from urllib.request import urlopen
url = 'https://en.wikipedia.org/wiki/List_of_cities_by_sunshine_duration'
html = urlopen(url) 
soup = BeautifulSoup(html, 'html.parser')
these few lines fetch data . but i guess that i need more.

i am going to add some

find_all('table') ## that helps me to scan the entire document to look for the following tag <table>
and the following

tables = soup.find_all('table')
can i do this like so?
Reply
#2
srry - BROY

but this makes no sense to me. it has nothing to do with the question

i guess this is a form of spam.
Reply
#3
(May-05-2021, 06:12 PM)apollo Wrote: and the following
tables = soup.find_all('table')
can i do this like so?
Yes,but need to find right table here a coupled of way,and a better way with Pandas.
Here a Notebook se that get table and can start to work with right away as it's now a DataFrame.

Here a standard way.
As you see get table but still need a lot work to get data if want do something useful with it.
Do not use urllib.
import requests
from bs4 import BeautifulSoup

url = 'https://en.wikipedia.org/wiki/List_of_cities_by_sunshine_duration'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'lxml')
print(soup.find('title').text)
print(soup.select_one('#mw-content-text > div.mw-parser-output > table:nth-child(9)'))
Output:
List of cities by sunshine duration - Wikipedia <table class="wikitable plainrowheaders sortable" style="text-align:right;"> <caption>Sunshine hours for selected cities in Africa </caption> <tbody><tr style="vertical-align:top"> <th>Country </th> <th>City </th> <th>Jan </th> <th>Feb </th> <th>Mar </th> <th>Apr </th> <th>May </th> <th>Jun </th> <th>Jul </th> <th>Aug </th> <th>Sep </th> <th>Oct </th> <th>Nov </th> <th>Dec </th> <th>Year </th> <th>Ref. </th></tr> <tr> <td style="text-align:left;"><a href="/wiki/Ivory_Coast" title="Ivory Coast">Ivory Coast</a> </td> <td style="text-align:left;"><a href="/wiki/Gagnoa" title="Gagnoa">Gagnoa</a> </td> <td style="background: #D5D500; color:#000000;;">183.0 </td> <td style="background: #D4D400; color:#000000;;">180.0 </td> <td style="background: #D8D800; color:#000000;;">196.0 </td> .....ect
apollo likes this post
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Fetching Images from DB in Django Dexty 2 1,623 Mar-15-2024, 08:43 AM
Last Post: firn100
Question Scraping Wikipedia Article (Name in 1 column & URL in 2nd column) ->CSV! Anyone? BrandonKastning 4 1,957 Jan-27-2022, 04:36 AM
Last Post: Larz60+
  Logic behind BeautifulSoup data-parsing jimsxxl 7 4,222 Apr-13-2021, 09:06 AM
Last Post: jimsxxl
  Need help scraping wikipedia table bborusz2 6 3,166 Dec-01-2020, 11:31 PM
Last Post: snippsat
  table from wikipedia flow50 5 5,363 Jul-01-2019, 07:12 PM
Last Post: snippsat
  Fetching and Parsing XML Data FalseFact 3 3,200 Apr-01-2019, 10:21 AM
Last Post: Larz60+
  how to make my product description fetching function generic? PrateekG 10 5,950 Jun-29-2018, 01:03 PM
Last Post: PrateekG
  Getting 'list index out of range' while fetching product details using BeautifulSoup? PrateekG 8 8,041 Jun-06-2018, 12:15 PM
Last Post: snippsat

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020