Python Forum
Beautiful Soap can't find a specific section on the page
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Beautiful Soap can't find a specific section on the page
#1
Hello,

Here is my code to explore this page:
Artificial Intelligence to Solve Pervasive Internet of Things Issues

import requests
from bs4 import BeautifulSoup
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0',
    'Accept': 'text/html,*/*',
    'Accept-Language': 'bg,en-US;q=0.7,en;q=0.3',
    'X-Requested-With': 'XMLHttpRequest',
    'Connection': 'keep-alive'}

isbn =  9780128185766
book_web_page = f'http://www.amazon.com/s?k={isbn}&ref=nb_sb_noss'
response = requests.get(book_web_page, headers=headers)
print("status code:\t", response.status_code)
page = BeautifulSoup(response.text, 'html.parser')

link_section = page.find('span', attrs={'id', 'productTitle'})
print("link_section type:\t", type(link_section))
Here is output:
Output:
status code: 200 link_section type: <class 'NoneType'>
And yet this section is indeed there:

[Image: Screenshot-from-2021-01-18-12-34-10.png]

... and page is completely available, because return code is 200, which means Ok.
Any comments ?

Thanks
programerAnel likes this post
Reply
#2
This is what we have discussed in pervious post about Selenium/Api.
A lot of content on Amazon get generated bye JavaScript,
then will not Request/BS work as they can not read/render JavaScript.
Just search print(page) return and you will see that there is no tag id="productTitle".

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import time

#--| Setup
options = Options()
options.add_argument("--headless")
options.add_argument("--window-size=1980,1020")
browser = webdriver.Chrome(executable_path=r'C:\cmder\bin\chromedriver.exe', options=options)
#--| Parse or automation
url = "https://www.amazon.com/Artificial-Intelligence-Pervasive-Internet-Things-ebook/dp/B08P34G67F/ref=sr_1_1?dchild=1&keywords=Artificial+Intelligence+to+Solve+Pervasive+Internet+of+Things+Issues&qid=1610901645&s=books&sr=1-1"
browser.get(url)
time.sleep(2)
soup = BeautifulSoup(browser.page_source, 'lxml')
# Example of using both to parse
#use_bs4 = soup.find('div', id="detailBullets_feature_div")
#print(use_bs4.text)
title = browser.find_elements_by_css_selector('#productTitle')
print(title[0].text)
print('-' * 25)
use_sel = browser.find_elements_by_css_selector('#detailBulletsWrapper_feature_div')
print(use_sel[0].text)   
Output:
Artificial Intelligence to Solve Pervasive Internet of Things Issues ------------------------- Product details ASIN : B08P34G67F Publisher : Academic Press; 1st edition (November 18, 2020) Publication date : November 18, 2020 Language: : English File size : 15241 KB Text-to-Speech : Enabled Enhanced typesetting : Enabled X-Ray : Not Enabled Word Wise : Enabled Print length : 366 pages Page numbers source ISBN : 0128185767 Lending : Not Enabled
Web-scraping part-2
snippsat Wrote:JavaScript,why do i not get all content Wall

JavaScript is used all over the web because it's unique position to run in Browser(client side).
This can make it more difficult to do parsing,
because Requests/bs4/lxml can not get all that's is executed/rendered bye JavaScript.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Problem with searching over Beautiful Soap object Pavel_47 30 9,610 Jun-30-2022, 10:27 PM
Last Post: snippsat
  Beautifull Soap. Split page using a value and not a tag. lillo123 5 3,325 Apr-21-2021, 09:11 AM
Last Post: lillo123
  web scraping extract particular Div section AjayBachu 7 10,091 May-12-2020, 03:24 PM
Last Post: AjayBachu
  Web scraping read particular section AjayBachu 4 2,990 May-08-2020, 07:33 AM
Last Post: AjayBachu
  use Xpath in Python :: libxml2 for a page-to-page skip-setting apollo 2 3,560 Mar-19-2020, 06:13 PM
Last Post: apollo
  Beautiful soup and tags starter_student 11 6,025 Jul-08-2019, 03:41 PM
Last Post: starter_student
  Beautiful Soup find_all() kirito85 2 3,297 Jun-14-2019, 02:17 AM
Last Post: kirito85
  [split] How to find a specific word in a webpage and How to count it. marpop 2 5,692 Mar-12-2019, 08:25 AM
Last Post: snippsat
  Beautiful soup won't find value even with CSS path copied. AdequatelyChilled 4 4,021 Jan-01-2019, 12:12 PM
Last Post: snippsat
  Need help with Beautiful Soup - table jlkmb 9 5,778 Dec-20-2018, 01:10 AM
Last Post: jlkmb

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020