Python Forum

Full Version: scrape data 1 go to next page scrape data 2 and so on
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi everyone, I'm fairly new to Python and brand new to web scraping. any help from the forum is greatly appreciated Smile

I have some code here that gets me what I want from page 1 of my website.
response = requests.get(url)
response

print(response.text)


soup = BeautifulSoup(r.text, 'html.parser')
results = soup.find(id='preblockBody')
results
print(results.prettify())
job_elems = results.find_all('table', class_='pbListingTable')
for job_elem in job_elems:
    title_elem2 = job_elem.find_all('tr', class_='pbListingTable1')
for pbListingTable1 in job_elem.find_all('tr', {'class':'pbListingTable1'}):
    print(pbListingTable1.text)
    title_elem = job_elem.find_all('tr', class_='pbListingTable0')
for pbListingTable0 in job_elem.find_all('tr', {'class':'pbListingTable0'}):
    print(pbListingTable0.text)
Next, I would like to go to page 2 and do the same thing then page 3 and so on. I'm unsure how to do so because the inspection indicates
Quote:<a href="javascript:gotoNextPage(2)">&nbsp;2&nbsp;</a>

Then i would like to summarize all of the outputs together.

How can I do so? Thank you very much
The link you show is JavaScript, so you will need to use selenium to scrape,
see quick tutorials on this forum:
web scraping part 1
web scraping part 2
Ok thank you. I'm new to web scraping, so should I expect to add a few lines of code or is this a rework using selenium?
Look at what gotoNextPage(2) JavaScript function do in source code to get to next page.
Also see what happens in network tab in eg Chrome DevTools when push button.
If can figure and do the same as JavaScript function do,then can get away with not using Selenium.
Thank you, but I'm not following. I'm pretty new and have everything set up except for the next page functionality. Even if someone can help me get to the next page and then I can rerun the code that would be immensely helpful.

I think i see what is meant by JavaScript function, if I take a look at Network I see this:

Quote:<script language="JavaScript">
function sortPage(i) {
document.location = baseHref + "website" + i;
}
function gotoNextPage(i) {
document.location = baseHref + "website" + i;
}
Hi everyone, I've rewritten my code using Selenium here:
table = driver.find_element_by_id('preblockBody')

job_elems = table.find_elements_by_xpath("//*[contains(@class,'pbListingTable')]")
for value in job_elems:
    print(value.text)

nxt=driver.find_element_by_xpath("//a[contains(@href, 'gotoNextPage(2)')]")
driver.execute_script("arguments[0].click();", nxt)
I'm now hoping to get some help on how to loop through all pages and append to a single output. I've tried a few different things, but again, pretty new so guidance is appreciated Smile
UPDATE

Wanted to post an update to see if there are any suggestions. I'm nearly there with my code, I was able to find something helpful where the 'Next' button is used instead of the individual 'gotoNextPage()' elements. However, it only appends the last page that it runs through. How can I append each page it clicks through to a master data frame?

driver = webdriver.Chrome()
driver.get('website')
username = driver.find_element_by_id("username")
password = driver.find_element_by_id("password")

username.send_keys("username")
password.send_keys("password"+"\n")

while True:
    driver.implicitly_wait(30)
    table = driver.find_element_by_id('preblockBody')
    information = []
    job_elems = table.find_elements_by_xpath("//*[contains(@class,'pbListingTable')]")
    for value in job_elems:
    #print(value.text)
        information.append(value.text)
        
    try:
        driver.find_element_by_partial_link_text('Next').click()
    except:
        break

driver.quit()
print(information)