Python Forum
Python - Scrapy Javascript Pagination (next_page)
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Python - Scrapy Javascript Pagination (next_page)
#1
Hello guys...i made this script for this website....but my pagination doesn't work because its just javascript...and when you click nextpage the container loads new data....
i did found the urls for the naxt pages, here's some examples:

Page2
Page3

But im getting the results from the first page only....:/

code:
# -*- coding: utf-8 -*-
import scrapy
import time
 
class SiriosbotSpider(scrapy.Spider):
    name = 'SiriosBot'
    start_urls = ['https://www.siriocenter.gr/Proionta/Mpoulonia-ApostatesTroxwn']
    def parse(self, response):
        for content in response.css('div.resultItemTxt'):
            item = {
            'Title' : content.css('th[colspan="2"] > a::text').extract(),
            'Price' : content.css('div.price > span::text').extract(),
            'Manufacture' : content.css('tr:nth-child(2)').extract(),
            'Model' : content.css('tr:nth-child(3)').extract(),
            'Eidos' : content.css('tr:nth-child(4)').extract(),
            'Typos' : content.css('tr:nth-child(5)').extract(),
            'Kare' : content.css('tr:nth-child(6)').extract(),
            'Comments' : content.css('tr:nth-child(7)').extract(),
            'ProductLink' : content.css('th[colspan="2"] > a::attr(href)').extract(),
            'Img' : content.css('div.resultItemImage > a').extract(),
            'CurrentURL' : response.url
            }
            yield item
 
        for next_page in response.css('div.paging > a:last-child::attr(href)'):
            url = response.urljoin(next_page.extract())
            yield scrapy.Request(url, self.parse)
Thank you! :D
Reply
#2
The best way to do things like this is often to figure out what requests are being made using your browser's developer tools, and simply recreate those.
For example, clicking the next button shows this request:
[Image: tNDcG2E.png]
[Image: 9RnVCbR.png]
I tried playing with some parameters, changing a few and omitting them, and also found out you can get all the results using a single request.
All that's left to do now is replace start_urls with start_requests() yielding a custom request, and you get all the items:
def start_requests(self):
    yield scrapy.FormRequest(
        url='https://www.siriocenter.gr/Proionta/PartialAntallaktika',
        formdata={
            'modelStr': json.dumps({
                'pageSize': 1000,
            }),
        },
    )
Reply
#3
(Oct-08-2018, 11:32 AM)stranac Wrote: The best way to do things like this is often to figure out what requests are being made using your browser's developer tools, and simply recreate those.
For example, clicking the next button shows this request:
[Image: tNDcG2E.png]
[Image: 9RnVCbR.png]
I tried playing with some parameters, changing a few and omitting them, and also found out you can get all the results using a single request.
All that's left to do now is replace start_urls with start_requests() yielding a custom request, and you get all the items:
def start_requests(self):
    yield scrapy.FormRequest(
        url='https://www.siriocenter.gr/Proionta/PartialAntallaktika',
        formdata={
            'modelStr': json.dumps({
                'pageSize': 1000,
            }),
        },
    )

wow...that was awesome....how did you find i can request all the data at once? to use this criteria on the other web sites :D
Reply
#4
Just tried increasing the page size.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  BeautifulSoup pagination using href rhat398 1 2,402 Jun-30-2021, 10:55 AM
Last Post: snippsat
  Using Python request without selenium on html form with javascript onclick submit but eraosa 0 3,171 Jan-09-2021, 06:08 PM
Last Post: eraosa
  Python Scrapy Date Extraction Issue tr8585 1 3,299 Aug-05-2020, 04:32 AM
Last Post: tr8585
  Python Scrapy tr8585 2 2,359 Aug-04-2020, 04:11 AM
Last Post: tr8585
  question about using javascript on python selenium Kai 1 1,887 Apr-12-2020, 04:28 AM
Last Post: Larz60+
  Python beautifulsoup pagination error The61 5 3,458 Apr-09-2020, 09:17 PM
Last Post: Larz60+
  Pagination prejni 2 2,392 Nov-18-2019, 10:45 AM
Last Post: alekson
  Scrapy Javascript Pagination (next_page) nazmulfinance 2 3,021 Nov-18-2019, 01:01 AM
Last Post: nazmulfinance
  the next_page command using Scrapy Splash is not working nazmulfinance 0 2,072 Nov-16-2019, 03:47 PM
Last Post: nazmulfinance
  pagination for non standarded pages zarize 12 5,995 Sep-02-2019, 12:35 PM
Last Post: zarize

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020