Python Forum

Full Version: Selenium - bypass Cloudflare bot detection
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello,

Cloudflare detects my scraper and blocks access to the site.
I have tried to use selenium_stealth, this seems to pass bot detection at https://bot.sannysoft.com/ but not at Cloudflare.

Any advice please?

[attachment=1677]

Here is my code:

import time
from selenium import webdriver
from selenium_stealth import stealth
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

options=webdriver.ChromeOptions()

options.add_argument("start-maximized")
#options.add_argument("--headless")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument("--disable-blink-features=AutomationControlled")
driver = webdriver.Chrome(options=options)

stealth(driver,
        languages=["en-US", "en"],
        vendor="Google Inc.",
        platform="Win32",
        webgl_vendor="Intel Inc.",
        renderer="Intel Iris OpenGL Engine",
        fix_hairline=True,
        )


driver.get('https://www.sanparks.org/reservations/accommodation/filters/parks/113/arrivalDate/2022-12-11/departureDate/2022-12-31/camps/0%7C116/types/0/features/0')
#driver.get('https://bot.sannysoft.com/')

time.sleep(10)

element = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.CLASS_NAME, 'load-more')))

soup = BeautifulSoup(driver.page_source, 'html.parser')

print (soup.contents)

driver.quit()
Ok seems I have solved the problem.

Use undetected_chromedriver
I'm trying this right now, based on something I saw in another discussion on this group, but don't know the syntax for that last line. It doesn't like just "row" in the append. The fetchall is returning a tuple.
I am also facing a similar situation 1v1 battle
Use ScrapingBypass web scraping API, which can help users bypass Cloudflare easily.

Quote:import requests
url = "https://api.scrapingbypass.com/"
method = "GET"
headers = {
"x-cb-apikey": r"your api key",
"x-cb-host": r"www.sanparks.org",
}
response = requests.request(method, url, headers=headers)
print(response.text)