Python Forum
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
extract javascript links
#1
Using selenium, I navigate to a results page which contains javascript links to additional results.
The html looks like:
Output:
<tbody><tr> <td><span>1</span></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$2')">2</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$3')">3</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$4')">4</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$5')">5</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$6')">6</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$7')">7</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$8')">8</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$9')">9</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$10')">10</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$11')">11</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$12')">12</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$13')">13</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$14')">14</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$15')">15</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$16')">16</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$17')">17</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$18')">18</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$19')">19</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$20')">20</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$21')">21</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$22')">22</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$23')">...</a></td><td><a href="javascript:__doPostBack('ctl00$MainContent$SearchControl$grdSearchResultsEntity','Page$Last')">Last</a></td> </tr> </tbody>
Is there a method available in Selenium that will expand these links (or show direct links to data locations) without actually loading the page?

I'd like to create a list that contains the actual links to data locations if possible.

I'm thinking that there must be a way to use find_elements but might be barking up the wrong tree.

EDIT: Feb16: 5:30 EDT

code which only works until I want the links (I can link individually by modifying a css_selector link for page number, and then click but I's rather (if possible) gather all of the links from the first results page and save in a list.

Selenium code
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC


class FindElementsJustTheFacts:
    def __init__(self):
        self.browser = None
        self.browser_running = False

    def get_page(self, letter):
        self.start_browser()
        self.browser.get('https://corp.sec.state.ma.us/CorpWeb/CorpSearch/CorpSearch.aspx')

        element = self.browser.find_element(By.CSS_SELECTOR, '#MainContent_txtEntityName')
        element.send_keys(letter)
        element = self.browser.find_element(By.CSS_SELECTOR, '#MainContent_ddRecordsPerPage > option:nth-child(4)').click()

        trs = self.browser.find_elements(By.CSS_SELECTOR, 'tr.link > td:nth-child(1) > table:nth-child(1)')
        for element in trs:
            print(self.browser.execute_script("arguments[0];", element))

        if self.browser_running:
            self.stop_browser()

    def start_browser(self):
        # useragent = "Mozilla/5.0 (Linux; Android 8.0.0; Pixel 2 XL Build/OPD1.170816.004) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Mobile Safari/537.36"
        profile = webdriver.FirefoxProfile()
        options = webdriver.FirefoxOptions()
        options.set_preference("dom.webnotifications.serviceworker.enabled", False)
        options.set_preference("dom.webnotifications.enabled", False)
        self.browser = webdriver.Firefox(firefox_profile=profile,options=options)
        self.browser.implicitly_wait(30)
        self.browser_running = True

    def stop_browser(self):
        self.browser.close()
        self.browser_running = False


def main():
    fej = FindElementsJustTheFacts()
    fej.get_page(letter='1')


if __name__ == '__main__':
    main()
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How to extract links from grid located on webpage Pavel_47 5 1,474 Aug-04-2023, 12:43 PM
Last Post: Gaurav_Kumar
  webscrapping links and then enter those links to scrape data kirito85 2 3,231 Jun-13-2019, 02:23 AM
Last Post: kirito85

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020