This is what we have discussed in pervious post about Selenium/Api.
A lot of content on Amazon get generated bye JavaScript,
then will not Request/BS work as they can not read/render JavaScript.
Just search
A lot of content on Amazon get generated bye JavaScript,
then will not Request/BS work as they can not read/render JavaScript.
Just search
print(page)
return and you will see that there is no tag id="productTitle"
.from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.common.keys import Keys from bs4 import BeautifulSoup import time #--| Setup options = Options() options.add_argument("--headless") options.add_argument("--window-size=1980,1020") browser = webdriver.Chrome(executable_path=r'C:\cmder\bin\chromedriver.exe', options=options) #--| Parse or automation url = "https://www.amazon.com/Artificial-Intelligence-Pervasive-Internet-Things-ebook/dp/B08P34G67F/ref=sr_1_1?dchild=1&keywords=Artificial+Intelligence+to+Solve+Pervasive+Internet+of+Things+Issues&qid=1610901645&s=books&sr=1-1" browser.get(url) time.sleep(2) soup = BeautifulSoup(browser.page_source, 'lxml') # Example of using both to parse #use_bs4 = soup.find('div', id="detailBullets_feature_div") #print(use_bs4.text) title = browser.find_elements_by_css_selector('#productTitle') print(title[0].text) print('-' * 25) use_sel = browser.find_elements_by_css_selector('#detailBulletsWrapper_feature_div') print(use_sel[0].text)
Output:Artificial Intelligence to Solve Pervasive Internet of Things Issues
-------------------------
Product details
ASIN : B08P34G67F
Publisher : Academic Press; 1st edition (November 18, 2020)
Publication date : November 18, 2020
Language: : English
File size : 15241 KB
Text-to-Speech : Enabled
Enhanced typesetting : Enabled
X-Ray : Not Enabled
Word Wise : Enabled
Print length : 366 pages
Page numbers source ISBN : 0128185767
Lending : Not Enabled
Web-scraping part-2snippsat Wrote:JavaScript,why do i not get all content
JavaScript is used all over the web because it's unique position to run in Browser(client side).
This can make it more difficult to do parsing,
because Requests/bs4/lxml can not get all that's is executed/rendered bye JavaScript.