Python Forum
Clicking Every Page and Attachment on Website
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Clicking Every Page and Attachment on Website
#1
Hello,
I need to develop a code that navigates to a website and then finds all possible pages on the site and clicks on them. Also, I would like it to open any file attachments and then close them. Pages are layered (i.e. clicking one page will lead to more/new pages to open).

Selenium seems like the obvious choice and I have a code to login and could manually go to pages via xpaths, but there are too many pages to hardcode each page. I have seen a few people talking about similar projects, but nothing seems to correlate.

xpath examples from the site:
//*[@id="site-subnav"]/div[2]/div/div[1]/nav/ul[2]/li[1]/div/div/div[2]/div/span
//*[@id="infobar-290_1"]/div/div[3]/div/a

pdf file xpath example: //*[@id="item-45"]/div/div/div/div[4]/section/div[2]/div[2]/div[3]/div[2]/div/div[2]/a

I can provide more info as needed, but at this point I just need to know where to start.
Reply
#2
Yes Selenium is fine for this.
The common mistake of trying to much at once if new at this.
Test one ting at time and see parse or downloads work.
If you navigate to another page after clicking it, then the next one will no longer keep the state of opening page DOM.
For pdf downloads may need to setup Capabilities & ChromeOptions

Basic setup that i use.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

#--| Setup
options = Options()
#options.add_argument("--headless")
#options.add_argument("--window-size=1980,1020")
driver = webdriver.Chrome(executable_path=r'C:\cmder\bin\chromedriver.exe', options=options)
#--| Parse or automation
url = "https://python-forum.io/"
driver.get(url)
title = driver.find_elements_by_css_selector("div.card-body.d-flex.p-0 > div > p")
print(title[0].text)
Output:
Welcome, we are a dedicated Python forum. We encourage back and forth discussions based on the topic of the thread. ....
For downloads may need add something like this.
options.add_argument("--browser.download.folderList=2") # Can now specify path
options.add_argument("--browser.helperApps.neverAsk.saveToDisk=application/pdf")
options.add_argument("--browser.download.dir=path/to/downloads/pics/folder")
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Unable to download TLS Report attachment blason16 6 549 Feb-26-2024, 07:36 AM
Last Post: Pedroski55
  Extract PDF Attachment from Gmail jstaffon 0 582 Sep-10-2023, 01:55 PM
Last Post: jstaffon
  I get attachment paperclip on email without any attachments monika_v 5 1,969 Mar-19-2022, 10:20 PM
Last Post: cosmarchy
  Trying to determine attachment file type before saving off.. cubangt 1 2,156 Feb-23-2022, 07:45 PM
Last Post: cubangt
  ATT00001 instead of PDF attachment to an email dominiklucas 0 3,768 Jul-25-2020, 04:05 PM
Last Post: dominiklucas
  Sending an email with attachment without using SMTP? PythonNPC 5 3,182 May-05-2020, 07:58 AM
Last Post: PythonNPC
  download base64 email attachment cferguson 3 4,730 Feb-25-2020, 06:50 PM
Last Post: cferguson
  Using IDLE, bug when clicking code StarBasket 4 2,475 Jul-31-2019, 12:30 PM
Last Post: Malt
  How to automate the clicking of GUI's menu? Hypermesher 3 3,206 Jul-26-2019, 08:55 AM
Last Post: Malt
  Add png to the body of the email, not the attachment Ivan87 8 5,951 Dec-14-2018, 11:05 AM
Last Post: wavic

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020