Python Forum
Cannot extract data from the next pages
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Cannot extract data from the next pages
Dear Members,

I am writing Python codes to extract EyeGlass listings at ''. The codes perfectly extract data from the first page and but fails to extract data from the next pages. There are in total 5 pages. I list both VS codes and Terminal report here. I highly appreciate your help.

# -*- coding: utf-8 -*-
import scrapy

class GlassSpider(scrapy.Spider):
    name = 'glass'
    allowed_domains = ['']
    start_urls = ['']

    def parse(self, response):
        names=response.xpath("//p[@class='pname col-sm-12']/a")
        for name in names:

            yield response.follow(url=link, callback=self.parse_glass, meta={'glass_name': name_var})

    def parse_glass(self, response):
        sku=response.xpath("//ul[@class='col-12 col-sm-6 default-content']/li[1]/text()").get()

            'glass_name': name_var,
            'price': price,
            'sku': sku,
            'frame': frame
        next_page = response.xpath("(//div[@class='custom-pagination']/ul/li)[7]/a/@href").get()
        if next_page:
            yield scrapy.Request(url=next_page, callback=self.parse)
Terminal Report:
change start_urls to include all 5 pages

start_urls = [f'{page}' for page in range(1, 6)]
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

Thank you, buran, for your response. It works perfectly fine now. If you do not mind, could you please briefly explain the problem in the code. I believe I will learn from your explanation and in the future solve this sort of problem.
well, I don't know what's there to explain. You have 2 levels of pages - the top 5 pages is the first level. When you parse these 5 pages you have all the urls of each individual product. The second levels is the each individual product page.
Your start_urls had only one of the 5 top level urls.

as explained in the docs, start_urls list is shortcut for start_requests method

def start_requests(self):
    for page in range(1, 6):
        yield scrapy.Request(url=f'{page}', callback=self.parse)
If you can't explain it to a six year old, you don't understand it yourself, Albert Einstein
How to Ask Questions The Smart Way: link and another link
Create MCV example
Debug small programs

The explanation completely makes sense. Thank you, buran.

Possibly Related Threads…
Thread Author Replies Views Last Post
  Extract data from sports betting sites nestor 4 2,495 Mar-30-2021, 04:37 PM
Last Post: Larz60+
  Extract data from a table Bob_M 3 970 Aug-14-2020, 03:36 PM
Last Post: Bob_M
  Extract data with Selenium and BeautifulSoup nestor 3 1,539 Jun-06-2020, 01:34 AM
Last Post: Larz60+
  Extract json-ld schema markup data and store in MongoDB Nuwan16 0 1,192 Apr-05-2020, 04:06 PM
Last Post: Nuwan16
  Extract data from a webpage cycloneseb 5 1,361 Apr-04-2020, 10:17 AM
Last Post: alekson
  Cannot Extract data through charts online AgileAVS 0 748 Feb-01-2020, 01:47 PM
Last Post: AgileAVS
  How to use Python to extract data from Zoho Creator software on the web dan7055 2 2,194 Jul-05-2019, 05:11 PM
Last Post: DeaD_EyE
  Python/BeautiifulSoup. list of urls ->parse->extract data to csv. getting ERROR IanTheLMT 2 2,035 Jul-04-2019, 02:31 AM
Last Post: IanTheLMT
  Help to extract data from web prasadmathe 4 1,591 May-20-2019, 10:59 PM
Last Post: michalmonday
  [Python 3] - Extract specific data from a web page using lxml module Takeshio 9 4,215 Aug-25-2018, 08:46 AM
Last Post: leotrubach

Forum Jump:

User Panel Messages

Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020