Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 scrape data 1 go to next page scrape data 2 and so on
#1
Hi everyone, I'm fairly new to Python and brand new to web scraping. any help from the forum is greatly appreciated Smile

I have some code here that gets me what I want from page 1 of my website.
response = requests.get(url)
response

print(response.text)


soup = BeautifulSoup(r.text, 'html.parser')
results = soup.find(id='preblockBody')
results
print(results.prettify())
job_elems = results.find_all('table', class_='pbListingTable')
for job_elem in job_elems:
    title_elem2 = job_elem.find_all('tr', class_='pbListingTable1')
for pbListingTable1 in job_elem.find_all('tr', {'class':'pbListingTable1'}):
    print(pbListingTable1.text)
    title_elem = job_elem.find_all('tr', class_='pbListingTable0')
for pbListingTable0 in job_elem.find_all('tr', {'class':'pbListingTable0'}):
    print(pbListingTable0.text)
Next, I would like to go to page 2 and do the same thing then page 3 and so on. I'm unsure how to do so because the inspection indicates
Quote:<a href="javascript:gotoNextPage(2)">&nbsp;2&nbsp;</a>

Then i would like to summarize all of the outputs together.

How can I do so? Thank you very much
Quote
#2
The link you show is JavaScript, so you will need to use selenium to scrape,
see quick tutorials on this forum:
web scraping part 1
web scraping part 2
Quote
#3
Ok thank you. I'm new to web scraping, so should I expect to add a few lines of code or is this a rework using selenium?
Quote
#4
Look at what gotoNextPage(2) JavaScript function do in source code to get to next page.
Also see what happens in network tab in eg Chrome DevTools when push button.
If can figure and do the same as JavaScript function do,then can get away with not using Selenium.
Quote
#5
Thank you, but I'm not following. I'm pretty new and have everything set up except for the next page functionality. Even if someone can help me get to the next page and then I can rerun the code that would be immensely helpful.

I think i see what is meant by JavaScript function, if I take a look at Network I see this:

Quote:<script language="JavaScript">
function sortPage(i) {
document.location = baseHref + "website" + i;
}
function gotoNextPage(i) {
document.location = baseHref + "website" + i;
}
Quote
#6
Hi everyone, I've rewritten my code using Selenium here:
table = driver.find_element_by_id('preblockBody')

job_elems = table.find_elements_by_xpath("//*[contains(@class,'pbListingTable')]")
for value in job_elems:
    print(value.text)

nxt=driver.find_element_by_xpath("//a[contains(@href, 'gotoNextPage(2)')]")
driver.execute_script("arguments[0].click();", nxt)
I'm now hoping to get some help on how to loop through all pages and append to a single output. I've tried a few different things, but again, pretty new so guidance is appreciated Smile
Quote
#7
UPDATE

Wanted to post an update to see if there are any suggestions. I'm nearly there with my code, I was able to find something helpful where the 'Next' button is used instead of the individual 'gotoNextPage()' elements. However, it only appends the last page that it runs through. How can I append each page it clicks through to a master data frame?

driver = webdriver.Chrome()
driver.get('website')
username = driver.find_element_by_id("username")
password = driver.find_element_by_id("password")

username.send_keys("username")
password.send_keys("password"+"\n")

while True:
    driver.implicitly_wait(30)
    table = driver.find_element_by_id('preblockBody')
    information = []
    job_elems = table.find_elements_by_xpath("//*[contains(@class,'pbListingTable')]")
    for value in job_elems:
    #print(value.text)
        information.append(value.text)
        
    try:
        driver.find_element_by_partial_link_text('Next').click()
    except:
        break

driver.quit()
print(information)
apollo likes this post
Quote

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Beautifulsoup doesn't scrape page (python 2.7) Hikki 0 76 Aug-01-2020, 05:54 PM
Last Post: Hikki
  Unable to Scrape Website muhamdasim 1 317 Mar-21-2020, 03:31 AM
Last Post: Larz60+
  use Xpath in Python :: libxml2 for a page-to-page skip-setting apollo 2 519 Mar-19-2020, 06:13 PM
Last Post: apollo
  Sending data to php page ebolisa 0 288 Mar-18-2020, 05:34 PM
Last Post: ebolisa
  Read url from CSV and Scrape website Prince_Bhatia 3 6,515 Jan-08-2020, 09:08 AM
Last Post: binaryanimal
  Scrap data from not standarized page? zarize 4 721 Nov-25-2019, 10:25 AM
Last Post: zarize
  Need advice how to scrape a Chinese webpage omar 2 428 Nov-21-2019, 12:30 PM
Last Post: snippsat
  No data when using scrapy to get data ADBYITMS 3 405 Nov-11-2019, 03:05 PM
Last Post: stranac
  How can i scrape dropdown value ? caca 0 333 Nov-03-2019, 11:24 PM
Last Post: caca
  Scrape multiple urls LXML santdoyle 1 1,593 Oct-26-2019, 09:53 PM
Last Post: snippsat

Forum Jump:


Users browsing this thread: 1 Guest(s)