Python Forum
Saving html page and reloading into selenium while developing all xpaths
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Saving html page and reloading into selenium while developing all xpaths
#1
I wanted to save the homepage that I was scraping so that I wouldn't have to fetch it every time I was making changes during development.
first I tried (filename is a pathlib Path object):
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--log-level=3')
filename = spath.savedhtmlpath / 'homepage.html'

browser = webdriver.Chrome(chrome_options=chrome_options, executable_path=r'/home/Larz60p/Drivers//chromedriver')
#--| Parse
browser.get(url)
html = browser.page_source
with filename.open('w') as fp:
    fp.write(html)
time.sleep(2)
where filename was a pathlib path, and then when reading back:
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--log-level=3')
filename = spath.savedhtmlpath / 'homepage.html'

path = f'file://{filename.resolve()}'
browser.get(path)
So far so good. But when I tried to extract information with xpath, I didn't get error, but couldn't find what I was looking for either. I wasn't able to determine what the issue was.

So...
This is a bit of a hack, but it works without flaw:
chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--log-level=3')

filename = spath.savedhtmlpath / 'homepage.html'
if not filename.exists():
    response = requests.get(url)
    if response.status_code == 200:
        with filename.open('wb') as fp:
            fp.write(response.content)

path = f'file://{filename.resolve()}'
browser.get(path)
I am almost satisfied using this method, especially as I will only use it for development, but my gut tells me there is a better way.

Anyone know of a better solution, or why 'html = browser.page_source' doesn't work? ??
Reply


Messages In This Thread
Saving html page and reloading into selenium while developing all xpaths - by Larz60+ - Sep-10-2018, 10:26 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Click on a button on web page using Selenium Pavel_47 7 4,728 Jan-05-2023, 04:20 AM
Last Post: ellapurnellrt
  selenium returns junk instead of html klaarnou 5 2,259 Mar-27-2022, 07:20 AM
Last Post: klaarnou
  Selenium/Helium loads up a blank web page firaki12345 0 2,064 Mar-23-2021, 11:51 AM
Last Post: firaki12345
  HTML multi select HTML listbox with Flask/Python rfeyer 0 4,654 Mar-14-2021, 12:23 PM
Last Post: rfeyer
  Parsing html page and working with checkbox (on a captcha) straannick 17 11,381 Feb-04-2021, 02:54 PM
Last Post: snippsat
  Using Python request without selenium on html form with javascript onclick submit but eraosa 0 3,196 Jan-09-2021, 06:08 PM
Last Post: eraosa
  API auto-refresh on HTML page using Flask toc 2 11,885 Dec-23-2020, 02:00 PM
Last Post: toc
  Selenium Parsing (unable to Parse page after loading) oneclick 7 6,041 Oct-30-2020, 08:13 PM
Last Post: tomalex
  Selenium Page Object Model with Python Cryptus 5 3,994 Aug-19-2020, 06:30 AM
Last Post: mlieqo
  Selenium on Angular page Martinelli 3 5,751 Jul-28-2020, 12:40 PM
Last Post: Martinelli

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020