Python Forum
to scrape wiki-page: getting back the results - can i use pandas also
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
to scrape wiki-page: getting back the results - can i use pandas also
#1
dear community - fellow python-experts, Wink


've been trying to scrape a table on Wikipedia using Beautifulsoup, but encountered some problems.

well the very first step is - i guess to check the table on the wikipage,
The classes are wikitable collapsible - that are collapsed mw-collapsible:
Well - there's no sortable class in there. We need to find out the matching table element.

The question is: how do I correctly point towards that table?

i need to hook up to some unique identifier, such as an id of the element.

Have had a look at the DOM tree, and check its parents - and if there is any unique identifier.

If i do it like so:
import requests
from bs4 import BeautifulSoup

URL = "https://en.wikipedia.org/wiki/List_of_current_heads_of_state_and_government"

res = requests.get(URL).text
soup = BeautifulSoup(res,'lxml')
for items in soup.find('table', class_='wikitable').find_all('tr')[1::1]:
    data = items.find_all(['th','td'])
    try:
        country = data[0].a.text
        title = data[1].a.text
        name = data[1].a.find_next_sibling().text
    except IndexError:pass
    print("{}|{}|{}".format(country,title,name))
well this is a way - and this leads to the results as seen here

Algeria|President|Abdelaziz Bouteflika
Andorra|Episcopal Co-Prince|Joan Enric Vives Sicília
Angola|President|João Lourenço
well this is one way _ but i think it is much much smarter to use pandas' and to put the data into a dataframe.
Well i am asking this since i am not very familiar with pandas.


look forward to hear from you
Smile
Reply
#2
That code is for Python 2💀,as you should not all use now.
Will give error message is use Python 3.
# Python 3.9
>>> import urllib2
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
ModuleNotFoundError: No module named 'urllib2'
You should anyway way use Requests for this.
Look at Web-Scraping part-1 and part 2.
apollo likes this post
Reply
#3
hello dear Snippsat

first of all: many thanks for the reply and all the hints. I will switcht o Python 3 and besides that i will
have a closer look at the linked manuals.


as allways your tipps & hints are great.

have a great day.

regards
Apolllo Smile
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  how to scrape page that works dynamicaly? samuelbachorik 0 722 Sep-23-2023, 10:38 AM
Last Post: samuelbachorik
  Python Obstacles | Krav Maga | Wiki Scraped Content [Column Copy] BrandonKastning 4 2,238 Jan-03-2022, 06:59 AM
Last Post: BrandonKastning
  Python Obstacles | Kapap | Wiki Scraped Content [Column Nulling] BrandonKastning 2 1,736 Jan-03-2022, 04:26 AM
Last Post: BrandonKastning
  Python Obstacles | American Kenpo | Wiki Scrape URL/Table and Store it in MariaDB BrandonKastning 6 2,861 Dec-29-2021, 12:38 AM
Last Post: BrandonKastning
Photo How do I scrape a web page? oradba4u 2 2,123 Dec-23-2020, 12:35 PM
Last Post: codeto
  Beautifulsoup doesn't scrape page (python 2.7) Hikki 0 2,004 Aug-01-2020, 05:54 PM
Last Post: Hikki
  use Xpath in Python :: libxml2 for a page-to-page skip-setting apollo 2 3,646 Mar-19-2020, 06:13 PM
Last Post: apollo
  scrape data 1 go to next page scrape data 2 and so on alkaline3 6 5,211 Mar-13-2020, 07:59 PM
Last Post: alkaline3
  How do i scrape website whose page changes using javsacript _dopostback function and Prince_Bhatia 1 7,246 Aug-06-2018, 09:45 AM
Last Post: wavic
  Scrape Facebook page user posts text stockholm 6 8,401 May-08-2017, 12:24 PM
Last Post: Joseph_f2

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020