I wanted to save the homepage that I was scraping so that I wouldn't have to fetch it every time I was making changes during development.
first I tried (filename is a pathlib Path object):
So...
This is a bit of a hack, but it works without flaw:
Anyone know of a better solution, or why 'html = browser.page_source' doesn't work? ??
first I tried (filename is a pathlib Path object):
chrome_options = Options() chrome_options.add_argument("--headless") chrome_options.add_argument('--disable-gpu') chrome_options.add_argument('--log-level=3') filename = spath.savedhtmlpath / 'homepage.html' browser = webdriver.Chrome(chrome_options=chrome_options, executable_path=r'/home/Larz60p/Drivers//chromedriver') #--| Parse browser.get(url) html = browser.page_source with filename.open('w') as fp: fp.write(html) time.sleep(2)where filename was a pathlib path, and then when reading back:
chrome_options = Options() chrome_options.add_argument("--headless") chrome_options.add_argument('--disable-gpu') chrome_options.add_argument('--log-level=3') filename = spath.savedhtmlpath / 'homepage.html' path = f'file://{filename.resolve()}' browser.get(path)So far so good. But when I tried to extract information with xpath, I didn't get error, but couldn't find what I was looking for either. I wasn't able to determine what the issue was.
So...
This is a bit of a hack, but it works without flaw:
chrome_options = Options() chrome_options.add_argument("--headless") chrome_options.add_argument('--disable-gpu') chrome_options.add_argument('--log-level=3') filename = spath.savedhtmlpath / 'homepage.html' if not filename.exists(): response = requests.get(url) if response.status_code == 200: with filename.open('wb') as fp: fp.write(response.content) path = f'file://{filename.resolve()}' browser.get(path)I am almost satisfied using this method, especially as I will only use it for development, but my gut tells me there is a better way.
Anyone know of a better solution, or why 'html = browser.page_source' doesn't work? ??