Python - Scrapy Javascript Pagination (next_page)

Baggelhsk95 · Oct-08-2018, 09:13 AM

Hello guys...i made this script for this website....but my pagination doesn't work because its just javascript...and when you click nextpage the container loads new data....
i did found the urls for the naxt pages, here's some examples:

Page2
Page3

But im getting the results from the first page only....:/

code:

# -*- coding: utf-8 -*-
import scrapy
import time
 
class SiriosbotSpider(scrapy.Spider):
    name = 'SiriosBot'
    start_urls = ['https://www.siriocenter.gr/Proionta/Mpoulonia-ApostatesTroxwn']
    def parse(self, response):
        for content in response.css('div.resultItemTxt'):
            item = {
            'Title' : content.css('th[colspan="2"] > a::text').extract(),
            'Price' : content.css('div.price > span::text').extract(),
            'Manufacture' : content.css('tr:nth-child(2)').extract(),
            'Model' : content.css('tr:nth-child(3)').extract(),
            'Eidos' : content.css('tr:nth-child(4)').extract(),
            'Typos' : content.css('tr:nth-child(5)').extract(),
            'Kare' : content.css('tr:nth-child(6)').extract(),
            'Comments' : content.css('tr:nth-child(7)').extract(),
            'ProductLink' : content.css('th[colspan="2"] > a::attr(href)').extract(),
            'Img' : content.css('div.resultItemImage > a').extract(),
            'CurrentURL' : response.url
            }
            yield item
 
        for next_page in response.css('div.paging > a:last-child::attr(href)'):
            url = response.urljoin(next_page.extract())
            yield scrapy.Request(url, self.parse)

Thank you! :D

***stranac*** · (This post was last modified: Oct-08-2018, 11:32 AM by stranac.)

The best way to do things like this is often to figure out what requests are being made using your browser's developer tools, and simply recreate those.
For example, clicking the next button shows this request:
[Image: tNDcG2E.png]

I tried playing with some parameters, changing a few and omitting them, and also found out you can get all the results using a single request.
All that's left to do now is replace start_urls with start_requests() yielding a custom request, and you get all the items:

def start_requests(self):
    yield scrapy.FormRequest(
        url='https://www.siriocenter.gr/Proionta/PartialAntallaktika',
        formdata={
            'modelStr': json.dumps({
                'pageSize': 1000,
            }),
        },
    )

Baggelhsk95 · Oct-08-2018, 12:22 PM

(Oct-08-2018, 11:32 AM)stranac Wrote: The best way to do things like this is often to figure out what requests are being made using your browser's developer tools, and simply recreate those.
For example, clicking the next button shows this request:

I tried playing with some parameters, changing a few and omitting them, and also found out you can get all the results using a single request.
All that's left to do now is replace start_urls with start_requests() yielding a custom request, and you get all the items:
def start_requests(self):
    yield scrapy.FormRequest(
        url='https://www.siriocenter.gr/Proionta/PartialAntallaktika',
        formdata={
            'modelStr': json.dumps({
                'pageSize': 1000,
            }),
        },
    )

wow...that was awesome....how did you find i can request all the data at once? to use this criteria on the other web sites :D

***stranac*** · Oct-08-2018, 01:20 PM

Just tried increasing the page size.

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	BeautifulSoup pagination using href	rhat398	1	2,402	Jun-30-2021, 10:55 AM Last Post: snippsat
	Using Python request without selenium on html form with javascript onclick submit but	eraosa	0	3,171	Jan-09-2021, 06:08 PM Last Post: eraosa
	Python Scrapy Date Extraction Issue	tr8585	1	3,299	Aug-05-2020, 04:32 AM Last Post: tr8585
	Python Scrapy	tr8585	2	2,357	Aug-04-2020, 04:11 AM Last Post: tr8585
	question about using javascript on python selenium	Kai	1	1,887	Apr-12-2020, 04:28 AM Last Post: Larz60+
	Python beautifulsoup pagination error	The61	5	3,457	Apr-09-2020, 09:17 PM Last Post: Larz60+
	Pagination	prejni	2	2,392	Nov-18-2019, 10:45 AM Last Post: alekson
	Scrapy Javascript Pagination (next_page)	nazmulfinance	2	3,021	Nov-18-2019, 01:01 AM Last Post: nazmulfinance
	the next_page command using Scrapy Splash is not working	nazmulfinance	0	2,071	Nov-16-2019, 03:47 PM Last Post: nazmulfinance
	pagination for non standarded pages	zarize	12	5,994	Sep-02-2019, 12:35 PM Last Post: zarize

Python - Scrapy Javascript Pagination (next_page)

User Panel Messages

Announcements