Python Forum

Full Version: *Beginner* web scraping/Beautiful Soup help
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello all!

I am trying to scrape a table of reviews from an album’s wikipedia page, using Beautiful Soup and requests. I have become stuck trying to visualise this.

It is the "Critical Reception" table on the page for the Ed Sheeran 2017 album "%". When I inspect this is says it is a 'wikitable floatright', but I can not understand what kind of data the words are. https://en.wikipedia.org/wiki/%C3%B7_(album)


My code so far has been


import requests
from bs4 import BeautifulSoup

soup = BeautifulSoup(response.text)

url1 = “÷ (album) - Wikipedia”
s = requests.Session()
response = s.get(url1, timeout = 10)
response


right_table = soup.find(‘table’, {“class”: ‘wikitablefloatright’})


header = [th.text.rstrip() for th in right_table [0].find_all(‘th’)]
print(header)
print(’------’)
print(len(header))
The final cell writes ‘NoneType’ object is not subscriptable.

Here is the inspection for the table. Let me know if anything is unclear - I am a beginner.

Many thanks,
there are multiple issues with the code you posted, to the extent it will never run, nor produce any erroro

import requests
from bs4 import BeautifulSoup
url = "https://en.wikipedia.org/wiki/%C3%B7_(album)"
response = requests.get(url, timeout = 10)
soup = BeautifulSoup(response.text, 'html.parser')
right_table = soup.find('table', {'class': 'wikitable floatright'})
header = [th.text.rstrip() for th in right_table.find_all('th')]
print(header)
print('------')
print(len(header))
Output:
['Aggregate scores', 'Source', 'Rating', 'Review scores', 'Source', 'Rating'] ------ 6
(Jan-28-2021, 10:28 AM)7ken8 Wrote: [ -> ]Hello all!

I am trying to scrape a table of reviews from an album’s wikipedia page, using Beautiful Soup and requests. I have become stuck trying to visualise this.

It is the "Critical Reception" table on the page for the Ed Sheeran 2017 album "%". When I inspect this is says it is a 'wikitable floatright', but I can not understand what kind of data the words are. https://en.wikipedia.org/wiki/%C3%B7_(album)


My code so far has been


import requests
from bs4 import BeautifulSoup

soup = BeautifulSoup(response.text)

url1 = “÷ (album) - Wikipedia”
s = requests.Session()
response = s.get(url1, timeout = 10)
response


right_table = soup.find(‘table’, {“class”: ‘wikitablefloatright’})


header = [th.text.rstrip() for th in right_table [0].find_all(‘th’)]
print(header)
print(’------’)
print(len(header))
The final cell writes ‘NoneType’ object is not subscriptable.

Here is the inspection for the table. Let me know if anything is unclear - I am a beginner.

Many thanks,






Hi Buran,

Thank you for your help, I was just wondering how in that box I can present the td name for the scores. As some publications are shown in img., but on inspection it does show the stars out of five in its description names. How can I present these as figures? Thanks

PS thank you for the tag notes.