fetching, parsing data from Wikipedia - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: fetching, parsing data from Wikipedia (/thread-33560.html) |
fetching, parsing data from Wikipedia - apollo - May-05-2021 hello dear python-experts, good day. this scraper fetches wikipedia pages it is a nice little scraper - it ...: import requests import urllib.request import time from bs4 import BeautifulSoup import numpy as np import pandas as pd from urllib.request import urlopen url = 'https://en.wikipedia.org/wiki/List_of_cities_by_sunshine_duration' html = urlopen(url) soup = BeautifulSoup(html, 'html.parser')these few lines fetch data . but i guess that i need more. i am going to add some find_all('table') ## that helps me to scan the entire document to look for the following tag <table>and the following tables = soup.find_all('table')can i do this like so? RE: fetching, parsing and writing into CSV - but only 1 percent of the whole dataset - apollo - May-06-2021 srry - BROY but this makes no sense to me. it has nothing to do with the question i guess this is a form of spam. RE: fetching, parsing data from Wikipedia - snippsat - May-06-2021 (May-05-2021, 06:12 PM)apollo Wrote: and the followingYes,but need to find right table here a coupled of way,and a better way with Pandas. Here a Notebook se that get table and can start to work with right away as it's now a DataFrame. Here a standard way. As you see get table but still need a lot work to get data if want do something useful with it. Do not use urllib .import requests from bs4 import BeautifulSoup url = 'https://en.wikipedia.org/wiki/List_of_cities_by_sunshine_duration' response = requests.get(url) soup = BeautifulSoup(response.content, 'lxml') print(soup.find('title').text) print(soup.select_one('#mw-content-text > div.mw-parser-output > table:nth-child(9)'))
|