Saving html page and reloading into selenium while developing all xpaths

**Larz60+** · (This post was last modified: Sep-10-2018, 10:26 AM by Larz60+.)

I wanted to save the homepage that I was scraping so that I wouldn't have to fetch it every time I was making changes during development.
first I tried (filename is a pathlib Path object):

chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--log-level=3')
filename = spath.savedhtmlpath / 'homepage.html'

browser = webdriver.Chrome(chrome_options=chrome_options, executable_path=r'/home/Larz60p/Drivers//chromedriver')
#--| Parse
browser.get(url)
html = browser.page_source
with filename.open('w') as fp:
    fp.write(html)
time.sleep(2)

where filename was a pathlib path, and then when reading back:

chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--log-level=3')
filename = spath.savedhtmlpath / 'homepage.html'

path = f'file://{filename.resolve()}'
browser.get(path)

So far so good. But when I tried to extract information with xpath, I didn't get error, but couldn't find what I was looking for either. I wasn't able to determine what the issue was.

So...
This is a bit of a hack, but it works without flaw:

chrome_options = Options()
chrome_options.add_argument("--headless")
chrome_options.add_argument('--disable-gpu')
chrome_options.add_argument('--log-level=3')

filename = spath.savedhtmlpath / 'homepage.html'
if not filename.exists():
    response = requests.get(url)
    if response.status_code == 200:
        with filename.open('wb') as fp:
            fp.write(response.content)

path = f'file://{filename.resolve()}'
browser.get(path)

I am almost satisfied using this method, especially as I will only use it for development, but my gut tells me there is a better way.

Anyone know of a better solution, or why 'html = browser.page_source' doesn't work? ??

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	Click on a button on web page using Selenium	Pavel_47	7	4,728	Jan-05-2023, 04:20 AM Last Post: ellapurnellrt
	selenium returns junk instead of html	klaarnou	5	2,259	Mar-27-2022, 07:20 AM Last Post: klaarnou
	Selenium/Helium loads up a blank web page	firaki12345	0	2,064	Mar-23-2021, 11:51 AM Last Post: firaki12345
	HTML multi select HTML listbox with Flask/Python	rfeyer	0	4,654	Mar-14-2021, 12:23 PM Last Post: rfeyer
	Parsing html page and working with checkbox (on a captcha)	straannick	17	11,381	Feb-04-2021, 02:54 PM Last Post: snippsat
	Using Python request without selenium on html form with javascript onclick submit but	eraosa	0	3,196	Jan-09-2021, 06:08 PM Last Post: eraosa
	API auto-refresh on HTML page using Flask	toc	2	11,885	Dec-23-2020, 02:00 PM Last Post: toc
	Selenium Parsing (unable to Parse page after loading)	oneclick	7	6,041	Oct-30-2020, 08:13 PM Last Post: tomalex
	Selenium Page Object Model with Python	Cryptus	5	3,994	Aug-19-2020, 06:30 AM Last Post: mlieqo
	Selenium on Angular page	Martinelli	3	5,751	Jul-28-2020, 12:40 PM Last Post: Martinelli

Saving html page and reloading into selenium while developing all xpaths

User Panel Messages

Announcements