Beautiful Soap can't find a specific section on the page - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Beautiful Soap can't find a specific section on the page (/thread-32057.html) |
Beautiful Soap can't find a specific section on the page - Pavel_47 - Jan-18-2021 Hello, Here is my code to explore this page: Artificial Intelligence to Solve Pervasive Internet of Things Issues import requests from bs4 import BeautifulSoup headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0', 'Accept': 'text/html,*/*', 'Accept-Language': 'bg,en-US;q=0.7,en;q=0.3', 'X-Requested-With': 'XMLHttpRequest', 'Connection': 'keep-alive'} isbn = 9780128185766 book_web_page = f'http://www.amazon.com/s?k={isbn}&ref=nb_sb_noss' response = requests.get(book_web_page, headers=headers) print("status code:\t", response.status_code) page = BeautifulSoup(response.text, 'html.parser') link_section = page.find('span', attrs={'id', 'productTitle'}) print("link_section type:\t", type(link_section))Here is output: And yet this section is indeed there:... and page is completely available, because return code is 200, which means Ok. Any comments ? Thanks RE: Beautiful Soap can't find a specific section on the page - snippsat - Jan-18-2021 This is what we have discussed in pervious post about Selenium/Api. A lot of content on Amazon get generated bye JavaScript, then will not Request/BS work as they can not read/render JavaScript. Just search print(page) return and you will see that there is no tag id="productTitle" .from selenium import webdriver from selenium.webdriver.chrome.options import Options from selenium.webdriver.common.keys import Keys from bs4 import BeautifulSoup import time #--| Setup options = Options() options.add_argument("--headless") options.add_argument("--window-size=1980,1020") browser = webdriver.Chrome(executable_path=r'C:\cmder\bin\chromedriver.exe', options=options) #--| Parse or automation url = "https://www.amazon.com/Artificial-Intelligence-Pervasive-Internet-Things-ebook/dp/B08P34G67F/ref=sr_1_1?dchild=1&keywords=Artificial+Intelligence+to+Solve+Pervasive+Internet+of+Things+Issues&qid=1610901645&s=books&sr=1-1" browser.get(url) time.sleep(2) soup = BeautifulSoup(browser.page_source, 'lxml') # Example of using both to parse #use_bs4 = soup.find('div', id="detailBullets_feature_div") #print(use_bs4.text) title = browser.find_elements_by_css_selector('#productTitle') print(title[0].text) print('-' * 25) use_sel = browser.find_elements_by_css_selector('#detailBulletsWrapper_feature_div') print(use_sel[0].text) Web-scraping part-2snippsat Wrote:JavaScript,why do i not get all content |