Python Forum
selenium returns junk instead of html
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
selenium returns junk instead of html
#1
Hello,

I am trying to scrape a dynamic website with selenium.
HTML is rendered in the <app-root></app-root> tag.
If I inspect the website in chrome (F12) it looks fine
   

But if I export the HTML from selenium its rubbish (see attachment)
   

What am I doing wrong?
Reply
#2
what does your code look like?
Reply
#3
(Mar-22-2022, 03:31 PM)Larz60+ Wrote: what does your code look like?

import time
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver import ChromeOptions
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup

options=ChromeOptions()
options.headless=True
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))

driver.get('https://www.sanparks.org/reservations/accommodation/filters/parks/113/arrivalDate/2022-12-11/departureDate/2022-12-31/camps/0%7C116/types/0/features/0')

time.sleep(10)

soup = BeautifulSoup(driver.page_source, 'html.parser')

print (soup.contents)

driver.quit()
Reply
#4
when I run it (i use firefox, not chrome, but results should be the same) It prints html, not junk.

Here's code that uses WebDriverWait (only waits as long as necessary), which will get correct data. (Again, for firefox) this is a very slow loading page.

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from inspect import currentframe, getframeinfo
from selenium.common.exceptions import TimeoutException
# from webdriver_manager.chrome import ChromeDriverManager
# from selenium.webdriver import ChromeOptions
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
 
# options=ChromeOptions()
# driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
 
options = webdriver.FirefoxOptions()
options.headless=True
driver = webdriver.Firefox(options=options)

driver.get('https://www.sanparks.org/reservations/accommodation/filters/parks/113/arrivalDate/2022-12-11/departureDate/2022-12-31/camps/0%7C116/types/0/features/0')

try:
    element = WebDriverWait(driver, 30).until(EC.presence_of_element_located((By.CLASS_NAME, 'load-more')))
except TimeoutException:
    frameinfo = getframeinfo(currentframe())
    print(f"Query timed out on line {frameinfo.lineno}")
    sys.exit(-1)

soup = BeautifulSoup(driver.page_source, 'html.parser')
 
print (soup)
 
driver.quit()
Reply
#5
Thank you very much.
I appreciate your help.

I am away for the next 4 days but will try it as soon as I get back and let you know.

Regards.

Bernard.
Reply
#6
Moved to separate thread : https://python-forum.io/thread-36759.html
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  HTML multi select HTML listbox with Flask/Python rfeyer 0 4,652 Mar-14-2021, 12:23 PM
Last Post: rfeyer
  Saving html page and reloading into selenium while developing all xpaths Larz60+ 4 4,204 Feb-04-2021, 07:01 AM
Last Post: jonathanwhite1
  Using Python request without selenium on html form with javascript onclick submit but eraosa 0 3,193 Jan-09-2021, 06:08 PM
Last Post: eraosa
  Selenium cant get elements from HTML(Rookie) Troop 1 2,178 Mar-31-2020, 03:37 AM
Last Post: Larz60+
  Python3 + BeautifulSoup4 + lxml (HTML -> CSV) - How to loop to next HTML/new CSV Row BrandonKastning 0 2,377 Mar-22-2020, 06:10 AM
Last Post: BrandonKastning
  [Selenium] Any Tricks To Block Junk Scripts From Loading? digitalmatic7 0 2,284 Feb-07-2018, 08:50 PM
Last Post: digitalmatic7
  Error in Selenium: CRITICAL:root:Selenium module is not installed...Exiting program. AcszE 1 3,647 Nov-03-2017, 08:41 PM
Last Post: metulburr

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020