Bottom Page

Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
 Cannot extract data from the next pages
Dear Members,

I am writing Python codes to extract EyeGlass listings at ''. The codes perfectly extract data from the first page and but fails to extract data from the next pages. There are in total 5 pages. I list both VS codes and Terminal report here. I highly appreciate your help.

# -*- coding: utf-8 -*-
import scrapy

class GlassSpider(scrapy.Spider):
    name = 'glass'
    allowed_domains = ['']
    start_urls = ['']

    def parse(self, response):
        names=response.xpath("//p[@class='pname col-sm-12']/a")
        for name in names:

            yield response.follow(url=link, callback=self.parse_glass, meta={'glass_name': name_var})

    def parse_glass(self, response):
        sku=response.xpath("//ul[@class='col-12 col-sm-6 default-content']/li[1]/text()").get()

            'glass_name': name_var,
            'price': price,
            'sku': sku,
            'frame': frame
        next_page = response.xpath("(//div[@class='custom-pagination']/ul/li)[7]/a/@href").get()
        if next_page:
            yield scrapy.Request(url=next_page, callback=self.parse)
Terminal Report:
change start_urls to include all 5 pages

start_urls = [f'{page}' for page in range(1, 6)]
Thank you, buran, for your response. It works perfectly fine now. If you do not mind, could you please briefly explain the problem in the code. I believe I will learn from your explanation and in the future solve this sort of problem.
well, I don't know what's there to explain. You have 2 levels of pages - the top 5 pages is the first level. When you parse these 5 pages you have all the urls of each individual product. The second levels is the each individual product page.
Your start_urls had only one of the 5 top level urls.

as explained in the docs, start_urls list is shortcut for start_requests method

def start_requests(self):
    for page in range(1, 6):
        yield scrapy.Request(url=f'{page}', callback=self.parse)
The explanation completely makes sense. Thank you, buran.

Top Page

Possibly Related Threads...
Thread Author Replies Views Last Post
  Extract data with Selenium and BeautifulSoup nestor 3 254 Jun-06-2020, 01:34 AM
Last Post: Larz60+
  Extract data from sports betting sites nestor 2 260 Apr-18-2020, 01:10 PM
Last Post: law
  Extract json-ld schema markup data and store in MongoDB Nuwan16 0 481 Apr-05-2020, 04:06 PM
Last Post: Nuwan16
  Extract data from a webpage cycloneseb 5 563 Apr-04-2020, 10:17 AM
Last Post: alekson
  Cannot Extract data through charts online AgileAVS 0 218 Feb-01-2020, 01:47 PM
Last Post: AgileAVS
  How to use Python to extract data from Zoho Creator software on the web dan7055 2 960 Jul-05-2019, 05:11 PM
Last Post: DeaD_EyE
  Python/BeautiifulSoup. list of urls ->parse->extract data to csv. getting ERROR IanTheLMT 2 889 Jul-04-2019, 02:31 AM
Last Post: IanTheLMT
  Help to extract data from web prasadmathe 4 730 May-20-2019, 10:59 PM
Last Post: michalmonday
  [Python 3] - Extract specific data from a web page using lxml module Takeshio 9 2,417 Aug-25-2018, 08:46 AM
Last Post: leotrubach
  webscraping - failing to extract specific text from rontar 2 936 May-19-2018, 08:01 AM
Last Post: rontar

Forum Jump:

Users browsing this thread: 1 Guest(s)