Mar-22-2022, 10:41 AM
selenium returns junk instead of html
selenium returns junk instead of html
|
Mar-22-2022, 03:31 PM
what does your code look like?
Mar-22-2022, 06:57 PM
(Mar-22-2022, 03:31 PM)Larz60+ Wrote: what does your code look like? import time from selenium import webdriver from selenium.webdriver.chrome.service import Service from webdriver_manager.chrome import ChromeDriverManager from selenium.webdriver import ChromeOptions from selenium.webdriver.common.by import By from bs4 import BeautifulSoup options=ChromeOptions() options.headless=True driver = webdriver.Chrome(service=Service(ChromeDriverManager().install())) driver.get('https://www.sanparks.org/reservations/accommodation/filters/parks/113/arrivalDate/2022-12-11/departureDate/2022-12-31/camps/0%7C116/types/0/features/0') time.sleep(10) soup = BeautifulSoup(driver.page_source, 'html.parser') print (soup.contents) driver.quit()
when I run it (i use firefox, not chrome, but results should be the same) It prints html, not junk.
Here's code that uses WebDriverWait (only waits as long as necessary), which will get correct data. (Again, for firefox) this is a very slow loading page. from selenium import webdriver from selenium.webdriver.chrome.service import Service from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC from inspect import currentframe, getframeinfo from selenium.common.exceptions import TimeoutException # from webdriver_manager.chrome import ChromeDriverManager # from selenium.webdriver import ChromeOptions from selenium.webdriver.common.by import By from bs4 import BeautifulSoup # options=ChromeOptions() # driver = webdriver.Chrome(service=Service(ChromeDriverManager().install())) options = webdriver.FirefoxOptions() options.headless=True driver = webdriver.Firefox(options=options) driver.get('https://www.sanparks.org/reservations/accommodation/filters/parks/113/arrivalDate/2022-12-11/departureDate/2022-12-31/camps/0%7C116/types/0/features/0') try: element = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.CLASS_NAME, 'load-more'))) except TimeoutException: frameinfo = getframeinfo(currentframe()) print(f"Query timed out on line {frameinfo.lineno}") sys.exit(-1) soup = BeautifulSoup(driver.page_source, 'html.parser') print (soup) driver.quit()
Mar-23-2022, 02:28 AM
Thank you very much.
I appreciate your help. I am away for the next 4 days but will try it as soon as I get back and let you know. Regards. Bernard.
Moved to separate thread : https://python-forum.io/thread-36759.html
|
|
Possibly Related Threads… | |||||
Thread | Author | Replies | Views | Last Post | |
HTML multi select HTML listbox with Flask/Python | rfeyer | 0 | 5,988 |
Mar-14-2021, 12:23 PM Last Post: rfeyer |
|
Saving html page and reloading into selenium while developing all xpaths | Larz60+ | 4 | 5,702 |
Feb-04-2021, 07:01 AM Last Post: jonathanwhite1 |
|
Using Python request without selenium on html form with javascript onclick submit but | eraosa | 0 | 3,839 |
Jan-09-2021, 06:08 PM Last Post: eraosa |
|
Selenium cant get elements from HTML(Rookie) | Troop | 1 | 2,701 |
Mar-31-2020, 03:37 AM Last Post: Larz60+ |
|
Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row | BrandonKastning | 0 | 3,055 |
Mar-22-2020, 06:10 AM Last Post: BrandonKastning |
|
[Selenium] Any Tricks To Block Junk Scripts From Loading? | digitalmatic7 | 0 | 2,831 |
Feb-07-2018, 08:50 PM Last Post: digitalmatic7 |
|
Error in Selenium: CRITICAL:root:Selenium module is not installed...Exiting program. | AcszE | 1 | 4,428 |
Nov-03-2017, 08:41 PM Last Post: metulburr |
Users browsing this thread: 1 Guest(s)