Beautifulsoup don't get me the page

Beautifulsoup don't get me the page - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: Beautifulsoup don't get me the page (/thread-21917.html)

Beautifulsoup don't get me the page - mariolopes - Oct-20-2019

Hi. I use this code

import requests
from bs4 import BeautifulSoup
pagina1="https://www.fragrantica.com/perfume/Chanel/Coco-Eau-de-Parfum-609.html"
pagina1=requests.get(pagina1, headers = {'User-agent': 'your bot 0.1'})
soup=BeautifulSoup(pagina1.content,"html.parser")
print(soup)

the result is not the source code of the page but someting like
<!DOCTYPE html>




 <html class="no-js" lang="en-US"> 
<head>
<title>Attention Required! | Cloudflare</title>
<meta id="captcha-bypass" name="captcha-bypass"/>
and a few more lines.
What happened? Why I can't get the source code with beautifulsoup?
Thank you

RE: Beautifulsoup don't get me the page - Larz60+ - Oct-20-2019

check status to make sure page has been downloaded.

response = requests.get(pagina1)
if response.status_code == 200:
    soup=BeautifulSoup(response.content,"lxml")
else:
    print(f"unable to fetch page: {pagina1}")

RE: Beautifulsoup don't get me the page - metulburr - Oct-20-2019

based on the html i got with your code, it looks like you are getting a captcha

<h2 data-translate="why_captcha_headline">Why do I have to complete a CAPTCHA?</h2>
<p data-translate="why_captcha_detail">Completing the CAPTCHA proves you are a human and gives you temporary access to the web property.</p>

RE: Beautifulsoup don't get me the page - mariolopes - Oct-22-2019

follow the link there is no CAPTCHA on this website.
I think the problem is with User-Agent. For some reason the website detects that the request is not from a browser. I solved the issue if I can read with python the source code of the page. But I don't know how to read, and get values, from the source code in Python.

RE: Beautifulsoup don't get me the page - snippsat - Oct-22-2019

First step is to try the user-agent that this site use.

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36

That dos not work as i tested it.
Next step is to use Selenium.

from selenium import webdriver
from bs4 import BeautifulSoup
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
import time

#--| Setup
options = Options()
#options.add_argument("--headless")
browser = webdriver.Chrome(executable_path=r'chromedriver.exe', options=options)
#--| Parse or automation
browser.get('https://www.fragrantica.com/perfume/Chanel/Coco-Eau-de-Parfum-609.html')
soup = BeautifulSoup(browser.page_source, 'lxml')
browser.implicitly_wait(5)
parfum = soup.select('#col1 > div > div > h1 > span')

Now it work,eg here i use CSS seletor to get parfum title name.
Text would be:

>>> parfum
[<span itemprop="name">Coco Eau de Parfum Chanel for women</span>]
>>> parfum[0].text
'Coco Eau de Parfum Chanel for women'

RE: Beautifulsoup don't get me the page - mariolopes - Oct-23-2019

Thank you
Selenium seems to be the only option for this website.