Python Forum

Full Version: selenium returns junk instead of html
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello,

I am trying to scrape a dynamic website with selenium.
HTML is rendered in the <app-root></app-root> tag.
If I inspect the website in chrome (F12) it looks fine
[attachment=1670]

But if I export the HTML from selenium its rubbish (see attachment)
[attachment=1671]

What am I doing wrong?
what does your code look like?
(Mar-22-2022, 03:31 PM)Larz60+ Wrote: [ -> ]what does your code look like?

import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver import ChromeOptions
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup

options=ChromeOptions()
options.headless=True
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))

driver.get('https://www.sanparks.org/reservations/accommodation/filters/parks/113/arrivalDate/2022-12-11/departureDate/2022-12-31/camps/0%7C116/types/0/features/0')

time.sleep(10)

soup = BeautifulSoup(driver.page_source, 'html.parser')

print (soup.contents)

driver.quit()
when I run it (i use firefox, not chrome, but results should be the same) It prints html, not junk.

Here's code that uses WebDriverWait (only waits as long as necessary), which will get correct data. (Again, for firefox) this is a very slow loading page.

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from inspect import currentframe, getframeinfo
from selenium.common.exceptions import TimeoutException
# from webdriver_manager.chrome import ChromeDriverManager
# from selenium.webdriver import ChromeOptions
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
 
# options=ChromeOptions()
# driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
 
options = webdriver.FirefoxOptions()
options.headless=True
driver = webdriver.Firefox(options=options)

driver.get('https://www.sanparks.org/reservations/accommodation/filters/parks/113/arrivalDate/2022-12-11/departureDate/2022-12-31/camps/0%7C116/types/0/features/0')

try:
    element = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.CLASS_NAME, 'load-more')))
except TimeoutException:
    frameinfo = getframeinfo(currentframe())
    print(f"Query timed out on line {frameinfo.lineno}")
    sys.exit(-1)

soup = BeautifulSoup(driver.page_source, 'html.parser')
 
print (soup)
 
driver.quit()
Thank you very much.
I appreciate your help.

I am away for the next 4 days but will try it as soon as I get back and let you know.

Regards.

Bernard.