Python Forum
scrape data 1 go to next page scrape data 2 and so on
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
scrape data 1 go to next page scrape data 2 and so on
#1
Hi everyone, I'm fairly new to Python and brand new to web scraping. any help from the forum is greatly appreciated Smile

I have some code here that gets me what I want from page 1 of my website.
response = requests.get(url)
response

print(response.text)


soup = BeautifulSoup(r.text, 'html.parser')
results = soup.find(id='preblockBody')
results
print(results.prettify())
job_elems = results.find_all('table', class_='pbListingTable')
for job_elem in job_elems:
    title_elem2 = job_elem.find_all('tr', class_='pbListingTable1')
for pbListingTable1 in job_elem.find_all('tr', {'class':'pbListingTable1'}):
    print(pbListingTable1.text)
    title_elem = job_elem.find_all('tr', class_='pbListingTable0')
for pbListingTable0 in job_elem.find_all('tr', {'class':'pbListingTable0'}):
    print(pbListingTable0.text)
Next, I would like to go to page 2 and do the same thing then page 3 and so on. I'm unsure how to do so because the inspection indicates
Quote:<a href="javascript:gotoNextPage(2)">&nbsp;2&nbsp;</a>

Then i would like to summarize all of the outputs together.

How can I do so? Thank you very much
Reply
#2
The link you show is JavaScript, so you will need to use selenium to scrape,
see quick tutorials on this forum:
web scraping part 1
web scraping part 2
Reply
#3
Ok thank you. I'm new to web scraping, so should I expect to add a few lines of code or is this a rework using selenium?
Reply
#4
Look at what gotoNextPage(2) JavaScript function do in source code to get to next page.
Also see what happens in network tab in eg Chrome DevTools when push button.
If can figure and do the same as JavaScript function do,then can get away with not using Selenium.
Reply
#5
Thank you, but I'm not following. I'm pretty new and have everything set up except for the next page functionality. Even if someone can help me get to the next page and then I can rerun the code that would be immensely helpful.

I think i see what is meant by JavaScript function, if I take a look at Network I see this:

Quote:<script language="JavaScript">
function sortPage(i) {
document.location = baseHref + "website" + i;
}
function gotoNextPage(i) {
document.location = baseHref + "website" + i;
}
Reply
#6
Hi everyone, I've rewritten my code using Selenium here:
table = driver.find_element_by_id('preblockBody')

job_elems = table.find_elements_by_xpath("//*[contains(@class,'pbListingTable')]")
for value in job_elems:
    print(value.text)

nxt=driver.find_element_by_xpath("//a[contains(@href, 'gotoNextPage(2)')]")
driver.execute_script("arguments[0].click();", nxt)
I'm now hoping to get some help on how to loop through all pages and append to a single output. I've tried a few different things, but again, pretty new so guidance is appreciated Smile
Reply
#7
UPDATE

Wanted to post an update to see if there are any suggestions. I'm nearly there with my code, I was able to find something helpful where the 'Next' button is used instead of the individual 'gotoNextPage()' elements. However, it only appends the last page that it runs through. How can I append each page it clicks through to a master data frame?

driver = webdriver.Chrome()
driver.get('website')
username = driver.find_element_by_id("username")
password = driver.find_element_by_id("password")

username.send_keys("username")
password.send_keys("password"+"\n")

while True:
    driver.implicitly_wait(30)
    table = driver.find_element_by_id('preblockBody')
    information = []
    job_elems = table.find_elements_by_xpath("//*[contains(@class,'pbListingTable')]")
    for value in job_elems:
    #print(value.text)
        information.append(value.text)
        
    try:
        driver.find_element_by_partial_link_text('Next').click()
    except:
        break

driver.quit()
print(information)
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Trying to scrape data from HTML with no identifiers pythonpaul32 2 795 Dec-02-2023, 03:42 AM
Last Post: pythonpaul32
  how to scrape page that works dynamicaly? samuelbachorik 0 682 Sep-23-2023, 10:38 AM
Last Post: samuelbachorik
  I am trying to scrape data to broadcast it on Telegram BarryBoos 1 1,990 Jun-10-2023, 02:36 PM
Last Post: snippsat
  Scrape table from multiple pages Nhattanktnn 1 823 Jun-07-2023, 09:35 AM
Last Post: Larz60+
  Simple screen scrape is baffling me. monty024 1 940 Apr-26-2023, 03:27 PM
Last Post: snippsat
  Help to web scrape from 2 diffrent sources Extra 0 814 Jan-05-2023, 12:39 AM
Last Post: Extra
  How can I web scrape the "alt" attribute from a "img" tag with Python? cisky 1 3,762 Aug-19-2022, 04:59 AM
Last Post: snippsat
  How can I target and scrape a data-stat never5000 5 2,742 Feb-11-2022, 07:59 PM
Last Post: snippsat
Lightbulb Python Obstacles | Kung-Fu | Full File HTML Document Scrape and Store it in MariaDB BrandonKastning 5 2,819 Dec-29-2021, 02:26 AM
Last Post: BrandonKastning
  Python Obstacles | American Kenpo | Wiki Scrape URL/Table and Store it in MariaDB BrandonKastning 6 2,783 Dec-29-2021, 12:38 AM
Last Post: BrandonKastning

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020