Python Forum
*Beginner* web scraping/Beautiful Soup help
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
*Beginner* web scraping/Beautiful Soup help
#1
Hello all!

I am trying to scrape a table of reviews from an album’s wikipedia page, using Beautiful Soup and requests. I have become stuck trying to visualise this.

It is the "Critical Reception" table on the page for the Ed Sheeran 2017 album "%". When I inspect this is says it is a 'wikitable floatright', but I can not understand what kind of data the words are. https://en.wikipedia.org/wiki/%C3%B7_(album)


My code so far has been


import requests
from bs4 import BeautifulSoup

soup = BeautifulSoup(response.text)

url1 = “÷ (album) - Wikipedia”
s = requests.Session()
response = s.get(url1, timeout = 10)
response


right_table = soup.find(‘table’, {“class”: ‘wikitablefloatright’})


header = [th.text.rstrip() for th in right_table [0].find_all(‘th’)]
print(header)
print(’------’)
print(len(header))
The final cell writes ‘NoneType’ object is not subscriptable.

Here is the inspection for the table. Let me know if anything is unclear - I am a beginner.

Many thanks,
buran write Jan-28-2021, 11:50 AM:
Please, use proper tags when post code, traceback, output, etc. This time I have added tags for you.
See BBcode help for more info.
Reply
#2
there are multiple issues with the code you posted, to the extent it will never run, nor produce any erroro

import requests
from bs4 import BeautifulSoup
url = "https://en.wikipedia.org/wiki/%C3%B7_(album)"
response = requests.get(url, timeout = 10)
soup = BeautifulSoup(response.text, 'html.parser')
right_table = soup.find('table', {'class': 'wikitable floatright'})
header = [th.text.rstrip() for th in right_table.find_all('th')]
print(header)
print('------')
print(len(header))
Output:
['Aggregate scores', 'Source', 'Rating', 'Review scores', 'Source', 'Rating'] ------ 6
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#3
(Jan-28-2021, 10:28 AM)7ken8 Wrote: Hello all!

I am trying to scrape a table of reviews from an album’s wikipedia page, using Beautiful Soup and requests. I have become stuck trying to visualise this.

It is the "Critical Reception" table on the page for the Ed Sheeran 2017 album "%". When I inspect this is says it is a 'wikitable floatright', but I can not understand what kind of data the words are. https://en.wikipedia.org/wiki/%C3%B7_(album)


My code so far has been


import requests
from bs4 import BeautifulSoup

soup = BeautifulSoup(response.text)

url1 = “÷ (album) - Wikipedia”
s = requests.Session()
response = s.get(url1, timeout = 10)
response


right_table = soup.find(‘table’, {“class”: ‘wikitablefloatright’})


header = [th.text.rstrip() for th in right_table [0].find_all(‘th’)]
print(header)
print(’------’)
print(len(header))
The final cell writes ‘NoneType’ object is not subscriptable.

Here is the inspection for the table. Let me know if anything is unclear - I am a beginner.

Many thanks,






Hi Buran,

Thank you for your help, I was just wondering how in that box I can present the td name for the scores. As some publications are shown in img., but on inspection it does show the stars out of five in its description names. How can I present these as figures? Thanks

PS thank you for the tag notes.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Beautiful Soup - access a rating value in a class KatMac 1 3,461 Apr-16-2021, 01:27 PM
Last Post: snippsat
  Help: Beautiful Soup - Parsing HTML table ironfelix717 2 2,672 Oct-01-2020, 02:19 PM
Last Post: snippsat
  Beautiful Soup (suddenly) doesn't get full webpage html j.crater 8 16,811 Jul-11-2020, 04:31 PM
Last Post: j.crater
  Requests-HTML vs Beautiful Soup - How to Choose? robin73 0 3,813 Jun-23-2020, 02:53 PM
Last Post: robin73
  looking for direction - scrappy, crawler, beautiful soup Sly_Corn 2 2,447 Mar-17-2020, 03:17 PM
Last Post: Sly_Corn
  Beautiful soup truncates results jonesjoz 4 3,870 Mar-09-2020, 06:04 PM
Last Post: jonesjoz
  Beautiful soup and tags starter_student 11 6,164 Jul-08-2019, 03:41 PM
Last Post: starter_student
  Beautiful Soup find_all() kirito85 2 3,357 Jun-14-2019, 02:17 AM
Last Post: kirito85
  [split] Using beautiful soup to get html attribute value moski 6 6,284 Jun-03-2019, 04:24 PM
Last Post: moski
  Using beautiful soup to get html attribute value graham23s 2 18,078 Apr-23-2019, 09:21 PM
Last Post: graham23s

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020