Python Forum

Full Version: Python - Scrapy Javascript Pagination (next_page)
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hello guys...i made this script for this website....but my pagination doesn't work because its just javascript...and when you click nextpage the container loads new data....
i did found the urls for the naxt pages, here's some examples:

Page2
Page3

But im getting the results from the first page only....:/

code:
# -*- coding: utf-8 -*-
import scrapy
import time
 
class SiriosbotSpider(scrapy.Spider):
    name = 'SiriosBot'
    start_urls = ['https://www.siriocenter.gr/Proionta/Mpoulonia-ApostatesTroxwn']
    def parse(self, response):
        for content in response.css('div.resultItemTxt'):
            item = {
            'Title' : content.css('th[colspan="2"] > a::text').extract(),
            'Price' : content.css('div.price > span::text').extract(),
            'Manufacture' : content.css('tr:nth-child(2)').extract(),
            'Model' : content.css('tr:nth-child(3)').extract(),
            'Eidos' : content.css('tr:nth-child(4)').extract(),
            'Typos' : content.css('tr:nth-child(5)').extract(),
            'Kare' : content.css('tr:nth-child(6)').extract(),
            'Comments' : content.css('tr:nth-child(7)').extract(),
            'ProductLink' : content.css('th[colspan="2"] > a::attr(href)').extract(),
            'Img' : content.css('div.resultItemImage > a').extract(),
            'CurrentURL' : response.url
            }
            yield item
 
        for next_page in response.css('div.paging > a:last-child::attr(href)'):
            url = response.urljoin(next_page.extract())
            yield scrapy.Request(url, self.parse)
Thank you! :D
The best way to do things like this is often to figure out what requests are being made using your browser's developer tools, and simply recreate those.
For example, clicking the next button shows this request:
[Image: tNDcG2E.png]
[Image: 9RnVCbR.png]
I tried playing with some parameters, changing a few and omitting them, and also found out you can get all the results using a single request.
All that's left to do now is replace start_urls with start_requests() yielding a custom request, and you get all the items:
def start_requests(self):
    yield scrapy.FormRequest(
        url='https://www.siriocenter.gr/Proionta/PartialAntallaktika',
        formdata={
            'modelStr': json.dumps({
                'pageSize': 1000,
            }),
        },
    )
(Oct-08-2018, 11:32 AM)stranac Wrote: [ -> ]The best way to do things like this is often to figure out what requests are being made using your browser's developer tools, and simply recreate those.
For example, clicking the next button shows this request:
[Image: tNDcG2E.png]
[Image: 9RnVCbR.png]
I tried playing with some parameters, changing a few and omitting them, and also found out you can get all the results using a single request.
All that's left to do now is replace start_urls with start_requests() yielding a custom request, and you get all the items:
def start_requests(self):
    yield scrapy.FormRequest(
        url='https://www.siriocenter.gr/Proionta/PartialAntallaktika',
        formdata={
            'modelStr': json.dumps({
                'pageSize': 1000,
            }),
        },
    )

wow...that was awesome....how did you find i can request all the data at once? to use this criteria on the other web sites :D
Just tried increasing the page size.