Python - Scrapy Javascript Pagination (next_page)

Python - Scrapy Javascript Pagination (next_page) - Printable Version

+- Python Forum (https://python-forum.io)
+-- Forum: Python Coding (https://python-forum.io/forum-7.html)
+--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html)
+--- Thread: Python - Scrapy Javascript Pagination (next_page) (/thread-13277.html)

Python - Scrapy Javascript Pagination (next_page) - Baggelhsk95 - Oct-08-2018

Hello guys...i made this script for this website....but my pagination doesn't work because its just javascript...and when you click nextpage the container loads new data....
i did found the urls for the naxt pages, here's some examples:

Page2
Page3

But im getting the results from the first page only....:/

code:

# -*- coding: utf-8 -*-
import scrapy
import time
 
class SiriosbotSpider(scrapy.Spider):
    name = 'SiriosBot'
    start_urls = ['https://www.siriocenter.gr/Proionta/Mpoulonia-ApostatesTroxwn']
    def parse(self, response):
        for content in response.css('div.resultItemTxt'):
            item = {
            'Title' : content.css('th[colspan="2"] > a::text').extract(),
            'Price' : content.css('div.price > span::text').extract(),
            'Manufacture' : content.css('tr:nth-child(2)').extract(),
            'Model' : content.css('tr:nth-child(3)').extract(),
            'Eidos' : content.css('tr:nth-child(4)').extract(),
            'Typos' : content.css('tr:nth-child(5)').extract(),
            'Kare' : content.css('tr:nth-child(6)').extract(),
            'Comments' : content.css('tr:nth-child(7)').extract(),
            'ProductLink' : content.css('th[colspan="2"] > a::attr(href)').extract(),
            'Img' : content.css('div.resultItemImage > a').extract(),
            'CurrentURL' : response.url
            }
            yield item
 
        for next_page in response.css('div.paging > a:last-child::attr(href)'):
            url = response.urljoin(next_page.extract())
            yield scrapy.Request(url, self.parse)

Thank you! :D

RE: Python - Scrapy Javascript Pagination (next_page) - stranac - Oct-08-2018

The best way to do things like this is often to figure out what requests are being made using your browser's developer tools, and simply recreate those.
For example, clicking the next button shows this request:
[Image: tNDcG2E.png]

I tried playing with some parameters, changing a few and omitting them, and also found out you can get all the results using a single request.
All that's left to do now is replace start_urls with start_requests() yielding a custom request, and you get all the items:

def start_requests(self):
    yield scrapy.FormRequest(
        url='https://www.siriocenter.gr/Proionta/PartialAntallaktika',
        formdata={
            'modelStr': json.dumps({
                'pageSize': 1000,
            }),
        },
    )

RE: Python - Scrapy Javascript Pagination (next_page) - Baggelhsk95 - Oct-08-2018

(Oct-08-2018, 11:32 AM)stranac Wrote: The best way to do things like this is often to figure out what requests are being made using your browser's developer tools, and simply recreate those.
For example, clicking the next button shows this request:

I tried playing with some parameters, changing a few and omitting them, and also found out you can get all the results using a single request.
All that's left to do now is replace start_urls with start_requests() yielding a custom request, and you get all the items:
def start_requests(self):
    yield scrapy.FormRequest(
        url='https://www.siriocenter.gr/Proionta/PartialAntallaktika',
        formdata={
            'modelStr': json.dumps({
                'pageSize': 1000,
            }),
        },
    )

wow...that was awesome....how did you find i can request all the data at once? to use this criteria on the other web sites :D

RE: Python - Scrapy Javascript Pagination (next_page) - stranac - Oct-08-2018

Just tried increasing the page size.