![]() |
to scrape wiki-page: getting back the results - can i use pandas also - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: to scrape wiki-page: getting back the results - can i use pandas also (/thread-32431.html) |
to scrape wiki-page: getting back the results - can i use pandas also - apollo - Feb-09-2021 dear community - fellow python-experts, ![]() 've been trying to scrape a table on Wikipedia using Beautifulsoup, but encountered some problems. well the very first step is - i guess to check the table on the wikipage, The classes are wikitable collapsible - that are collapsed mw-collapsible: Well - there's no sortable class in there. We need to find out the matching table element. The question is: how do I correctly point towards that table? i need to hook up to some unique identifier, such as an id of the element. Have had a look at the DOM tree, and check its parents - and if there is any unique identifier. If i do it like so: import requests from bs4 import BeautifulSoup URL = "https://en.wikipedia.org/wiki/List_of_current_heads_of_state_and_government" res = requests.get(URL).text soup = BeautifulSoup(res,'lxml') for items in soup.find('table', class_='wikitable').find_all('tr')[1::1]: data = items.find_all(['th','td']) try: country = data[0].a.text title = data[1].a.text name = data[1].a.find_next_sibling().text except IndexError:pass print("{}|{}|{}".format(country,title,name))well this is a way - and this leads to the results as seen here Algeria|President|Abdelaziz Bouteflika Andorra|Episcopal Co-Prince|Joan Enric Vives Sicília Angola|President|João Lourençowell this is one way _ but i think it is much much smarter to use pandas' and to put the data into a dataframe. Well i am asking this since i am not very familiar with pandas. look forward to hear from you ![]() RE: its all about the logic: findall posts & the corresponding threads - on baord - snippsat - Feb-09-2021 That code is for Python 2💀,as you should not all use now. Will give error message is use Python 3. # Python 3.9 >>> import urllib2 Traceback (most recent call last): File "<interactive input>", line 1, in <module> ModuleNotFoundError: No module named 'urllib2'You should anyway way use Requests for this. Look at Web-Scraping part-1 and part 2. RE: its all about the logic: findall posts & the corresponding threads - on baord - apollo - Feb-09-2021 hello dear Snippsat first of all: many thanks for the reply and all the hints. I will switcht o Python 3 and besides that i will have a closer look at the linked manuals. as allways your tipps & hints are great. have a great day. regards Apolllo ![]() |