Python Forum
Cannot extract data from the next pages
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Cannot extract data from the next pages
#1
Dear Members,

I am writing Python codes to extract EyeGlass listings at 'https://www.glassesshop.com/bestsellers'. The codes perfectly extract data from the first page and but fails to extract data from the next pages. There are in total 5 pages. I list both VS codes and Terminal report here. I highly appreciate your help.

# -*- coding: utf-8 -*-
import scrapy


class GlassSpider(scrapy.Spider):
    name = 'glass'
    allowed_domains = ['www.glassesshop.com']
    start_urls = ['https://www.glassesshop.com/bestsellers']

    def parse(self, response):
        names=response.xpath("//p[@class='pname col-sm-12']/a")
        for name in names:
            name_var=name.xpath(".//text()").get()
            link=name.xpath(".//@href").get()

            yield response.follow(url=link, callback=self.parse_glass, meta={'glass_name': name_var})

    def parse_glass(self, response):
        name_var=response.request.meta['glass_name']
        price=response.xpath("//span[@class='product-price-original']/text()").get()
        sku=response.xpath("//ul[@class='col-12 col-sm-6 default-content']/li[1]/text()").get()
        frame=response.xpath("//a[@class='col01']/text()").get()

        yield{
            'glass_name': name_var,
            'price': price,
            'sku': sku,
            'frame': frame
            }
        
        next_page = response.xpath("(//div[@class='custom-pagination']/ul/li)[7]/a/@href").get()
        
        if next_page:
            yield scrapy.Request(url=next_page, callback=self.parse)
Terminal Report:
Reply
#2
change start_urls to include all 5 pages

start_urls = [f'https://www.glassesshop.com/bestsellers?page={page}' for page in range(1, 6)]
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#3
Thank you, buran, for your response. It works perfectly fine now. If you do not mind, could you please briefly explain the problem in the code. I believe I will learn from your explanation and in the future solve this sort of problem.
Reply
#4
well, I don't know what's there to explain. You have 2 levels of pages - the top 5 pages is the first level. When you parse these 5 pages you have all the urls of each individual product. The second levels is the each individual product page.
Your start_urls had only one of the 5 top level urls.

as explained in the docs, start_urls list is shortcut for start_requests method

def start_requests(self):
    for page in range(1, 6):
        yield scrapy.Request(url=f'https://www.glassesshop.com/bestsellers?page={page}', callback=self.parse)
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Reply
#5
The explanation completely makes sense. Thank you, buran.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  Extract data from sports betting sites nestor 3 5,553 Mar-30-2021, 04:37 PM
Last Post: Larz60+
  Extract data from a table Bob_M 3 2,627 Aug-14-2020, 03:36 PM
Last Post: Bob_M
  Extract data with Selenium and BeautifulSoup nestor 3 3,818 Jun-06-2020, 01:34 AM
Last Post: Larz60+
  Extract json-ld schema markup data and store in MongoDB Nuwan16 0 2,413 Apr-05-2020, 04:06 PM
Last Post: Nuwan16
  Extract data from a webpage cycloneseb 5 2,819 Apr-04-2020, 10:17 AM
Last Post: alekson
  Cannot Extract data through charts online AgileAVS 0 1,813 Feb-01-2020, 01:47 PM
Last Post: AgileAVS
  How to use Python to extract data from Zoho Creator software on the web dan7055 2 3,957 Jul-05-2019, 05:11 PM
Last Post: DeaD_EyE
  Python/BeautiifulSoup. list of urls ->parse->extract data to csv. getting ERROR IanTheLMT 2 3,931 Jul-04-2019, 02:31 AM
Last Post: IanTheLMT
  Help to extract data from web prasadmathe 4 3,093 May-20-2019, 10:59 PM
Last Post: michalmonday
  [Python 3] - Extract specific data from a web page using lxml module Takeshio 9 7,024 Aug-25-2018, 08:46 AM
Last Post: leotrubach

Forum Jump:

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020