![]() |
Python - Scrapy Javascript Pagination (next_page) - Printable Version +- Python Forum (https://python-forum.io) +-- Forum: Python Coding (https://python-forum.io/forum-7.html) +--- Forum: Web Scraping & Web Development (https://python-forum.io/forum-13.html) +--- Thread: Python - Scrapy Javascript Pagination (next_page) (/thread-13277.html) |
Python - Scrapy Javascript Pagination (next_page) - Baggelhsk95 - Oct-08-2018 Hello guys...i made this script for this website....but my pagination doesn't work because its just javascript...and when you click nextpage the container loads new data.... i did found the urls for the naxt pages, here's some examples: Page2 Page3 But im getting the results from the first page only....:/ code: # -*- coding: utf-8 -*- import scrapy import time class SiriosbotSpider(scrapy.Spider): name = 'SiriosBot' start_urls = ['https://www.siriocenter.gr/Proionta/Mpoulonia-ApostatesTroxwn'] def parse(self, response): for content in response.css('div.resultItemTxt'): item = { 'Title' : content.css('th[colspan="2"] > a::text').extract(), 'Price' : content.css('div.price > span::text').extract(), 'Manufacture' : content.css('tr:nth-child(2)').extract(), 'Model' : content.css('tr:nth-child(3)').extract(), 'Eidos' : content.css('tr:nth-child(4)').extract(), 'Typos' : content.css('tr:nth-child(5)').extract(), 'Kare' : content.css('tr:nth-child(6)').extract(), 'Comments' : content.css('tr:nth-child(7)').extract(), 'ProductLink' : content.css('th[colspan="2"] > a::attr(href)').extract(), 'Img' : content.css('div.resultItemImage > a').extract(), 'CurrentURL' : response.url } yield item for next_page in response.css('div.paging > a:last-child::attr(href)'): url = response.urljoin(next_page.extract()) yield scrapy.Request(url, self.parse)Thank you! :D RE: Python - Scrapy Javascript Pagination (next_page) - stranac - Oct-08-2018 The best way to do things like this is often to figure out what requests are being made using your browser's developer tools, and simply recreate those. For example, clicking the next button shows this request: ![]() ![]() I tried playing with some parameters, changing a few and omitting them, and also found out you can get all the results using a single request. All that's left to do now is replace start_urls with start_requests() yielding a custom request, and you get all the items:def start_requests(self): yield scrapy.FormRequest( url='https://www.siriocenter.gr/Proionta/PartialAntallaktika', formdata={ 'modelStr': json.dumps({ 'pageSize': 1000, }), }, ) RE: Python - Scrapy Javascript Pagination (next_page) - Baggelhsk95 - Oct-08-2018 (Oct-08-2018, 11:32 AM)stranac Wrote: The best way to do things like this is often to figure out what requests are being made using your browser's developer tools, and simply recreate those. wow...that was awesome....how did you find i can request all the data at once? to use this criteria on the other web sites :D RE: Python - Scrapy Javascript Pagination (next_page) - stranac - Oct-08-2018 Just tried increasing the page size. |