Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Web scraping (selenium (i think))
#1
When scraping a new site, I like to download each page so that I can work on it offline until I perfect the code.

If I save for example browser.page_source, I get the page and all of the links, etc. which is helpful.
But what I'd really like to have is what is stored when from firefox, you use the 'Save Page As' from file menu,
which not only saves the page, but all of the supporting images, css files, javascript, etc. in a separate directory.

I could write code to do this, but not sure exactly what I need to download to be 'not too little', or 'not too much'

With selenium, when the page is brought up using:
caps = webdriver.DesiredCapabilities().FIREFOX
            caps["marionette"] = True
browser.get(url)
the firefox menu is not shown, so clicking on 'Save Page As' is not an option.

the question: Does anyone know how to do this?
If not, does anyone know exactly what to download to be 'just enough'?

I found a package 'pywebcopy' which does a great job of downloading a page, and it's peripheral files,
but all of the links are missing in the html.
Reply


Messages In This Thread
Web scraping (selenium (i think)) - by Larz60+ - Jan-25-2019, 09:07 PM
RE: Web scraping (selenium (i think)) - by Larz60+ - Jan-25-2019, 11:47 PM
RE: Web scraping (selenium (i think)) - by Larz60+ - Jan-26-2019, 03:12 AM
RE: Web scraping (selenium (i think)) - by Larz60+ - Jan-26-2019, 11:44 PM
RE: Web scraping (selenium (i think)) - by Larz60+ - Jan-27-2019, 12:22 AM
RE: Web scraping (selenium (i think)) - by Larz60+ - Jan-27-2019, 02:57 AM

Possibly Related Threads…
Thread Author Replies Views Last Post
  Scraping div tags with selenium, need help hfakoor2 1 1,080 Mar-12-2023, 08:31 AM
Last Post: hfakoor2
  Web scraping cookie in URL blocks selenium Alex06 2 2,459 Jan-10-2021, 01:43 PM
Last Post: Alex06
  Web Page not opening while web scraping through python selenium sumandas89 4 10,137 Nov-19-2018, 02:47 PM
Last Post: snippsat
  web scraping with selenium and bs4 Prince_Bhatia 2 3,786 Sep-18-2018, 10:59 AM
Last Post: Prince_Bhatia
  scraping javascript websites with selenium DoctorEvil 1 3,388 Jun-08-2018, 06:40 PM
Last Post: DoctorEvil
  Combining selenium and beautifulsoup for web scraping sumandas89 3 11,659 Jan-30-2018, 02:14 PM
Last Post: metulburr
  web scraping using selenium sumandas89 3 3,595 Jan-05-2018, 01:45 PM
Last Post: metulburr
  Error in Selenium: CRITICAL:root:Selenium module is not installed...Exiting program. AcszE 1 3,649 Nov-03-2017, 08:41 PM
Last Post: metulburr

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020