Beautiful Soap can't find a specific section on the page

Pavel_47 · Jan-18-2021, 11:42 AM

Hello,

Here is my code to explore this page:
Artificial Intelligence to Solve Pervasive Internet of Things Issues

import requests
from bs4 import BeautifulSoup
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0',
    'Accept': 'text/html,*/*',
    'Accept-Language': 'bg,en-US;q=0.7,en;q=0.3',
    'X-Requested-With': 'XMLHttpRequest',
    'Connection': 'keep-alive'}

isbn =  9780128185766
book_web_page = f'http://www.amazon.com/s?k={isbn}&ref=nb_sb_noss'
response = requests.get(book_web_page, headers=headers)
print("status code:\t", response.status_code)
page = BeautifulSoup(response.text, 'html.parser')

link_section = page.find('span', attrs={'id', 'productTitle'})
print("link_section type:\t", type(link_section))

Here is output:

Output:status code:	 200
link_section type:	 <class 'NoneType'>

And yet this section is indeed there:

[Image: Screenshot-from-2021-01-18-12-34-10.png]

... and page is completely available, because return code is 200, which means Ok.
Any comments ?

Thanks

***snippsat*** · (This post was last modified: Jan-18-2021, 02:18 PM by snippsat.)

This is what we have discussed in pervious post about Selenium/Api.
A lot of content on Amazon get generated bye JavaScript,
then will not Request/BS work as they can not read/render JavaScript.
Just search print(page) return and you will see that there is no tag id="productTitle".

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup
import time

#--| Setup
options = Options()
options.add_argument("--headless")
options.add_argument("--window-size=1980,1020")
browser = webdriver.Chrome(executable_path=r'C:\cmder\bin\chromedriver.exe', options=options)
#--| Parse or automation
url = "https://www.amazon.com/Artificial-Intelligence-Pervasive-Internet-Things-ebook/dp/B08P34G67F/ref=sr_1_1?dchild=1&keywords=Artificial+Intelligence+to+Solve+Pervasive+Internet+of+Things+Issues&qid=1610901645&s=books&sr=1-1"
browser.get(url)
time.sleep(2)
soup = BeautifulSoup(browser.page_source, 'lxml')
# Example of using both to parse
#use_bs4 = soup.find('div', id="detailBullets_feature_div")
#print(use_bs4.text)
title = browser.find_elements_by_css_selector('#productTitle')
print(title[0].text)
print('-' * 25)
use_sel = browser.find_elements_by_css_selector('#detailBulletsWrapper_feature_div')
print(use_sel[0].text)

Output:Artificial Intelligence to Solve Pervasive Internet of Things Issues
-------------------------
Product details
ASIN : B08P34G67F
Publisher : Academic Press; 1st edition (November 18, 2020)
Publication date : November 18, 2020
Language: : English
File size : 15241 KB
Text-to-Speech : Enabled
Enhanced typesetting : Enabled
X-Ray : Not Enabled
Word Wise : Enabled
Print length : 366 pages
Page numbers source ISBN : 0128185767
Lending : Not Enabled

Web-scraping part-2

snippsat Wrote:JavaScript,why do i not get all content

JavaScript is used all over the web because it's unique position to run in Browser(client side).
This can make it more difficult to do parsing,
because Requests/bs4/lxml can not get all that's is executed/rendered bye JavaScript.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Problem with searching over Beautiful Soap object	Pavel_47	30	10,049	Jun-30-2022, 10:27 PM Last Post: snippsat
	Beautifull Soap. Split page using a value and not a tag.	lillo123	5	3,422	Apr-21-2021, 09:11 AM Last Post: lillo123
	web scraping extract particular Div section	AjayBachu	7	10,393	May-12-2020, 03:24 PM Last Post: AjayBachu
	Web scraping read particular section	AjayBachu	4	3,077	May-08-2020, 07:33 AM Last Post: AjayBachu
	use Xpath in Python :: libxml2 for a page-to-page skip-setting	apollo	2	3,637	Mar-19-2020, 06:13 PM Last Post: apollo
	Beautiful soup and tags	starter_student	11	6,183	Jul-08-2019, 03:41 PM Last Post: starter_student
	Beautiful Soup find_all()	kirito85	2	3,372	Jun-14-2019, 02:17 AM Last Post: kirito85
	[split] How to find a specific word in a webpage and How to count it.	marpop	2	5,810	Mar-12-2019, 08:25 AM Last Post: snippsat
	Beautiful soup won't find value even with CSS path copied.	AdequatelyChilled	4	4,091	Jan-01-2019, 12:12 PM Last Post: snippsat
	Need help with Beautiful Soup - table	jlkmb	9	5,945	Dec-20-2018, 01:10 AM Last Post: jlkmb

Beautiful Soap can't find a specific section on the page

User Panel Messages

Announcements